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Forward 

This  5thStm»  and  Progress  Report  conies  atatimewhen  we  are  an  thinking  of  the  long  term  future  of  the  National 
Center  for  Voice  and  Speech.  Many  things  are  dunging  right  now  in  the  health  science  arena.  All  of  us  are  familiar,  of 
course,  with  die  day  to  day  developments  of  President  Clinton’s  health  care  package.  None  of  us  know  exactly  what  the 
impact  will  be  on  the  long  term  welfare  of  biomedical  research,  but  we  are  all  hoping  that  NIH  funding  will  continue  at 
a  healthy  pace.  Last  year  the  NIDCD  budget  was  reduced  from  its  previous  year  for  the  first  time  ever,  which  made  it  very 
difficult  to  obtain  new  grant  applications.  Recent  announcements  indicate,  however,  that  there  may  be  an  increase  in  the 
NIDCD  budget  of  approximately  S  percent  for  the  coming  year.  This  is  good  news,  given  the  very  tight  fiscal  policies  that 
Congress  is  working  under  these  days. 

One  thing  is  certain -each  year  we  have  to  become  more  clever  as  researchers.  The  Advisory  Board  to  NIDCD 
wishes  us  to  get  deeper  into  the  molecular  structure  of  all  the  organs  of  the  human  body  involved  in  speech  communication. 
At  the  same  time,  they  wish  us  to  understand  the  whole  body  as  a  system.  Yet  it  is  becoming  more  and  more  difficult  to 
do  invasive  procedures,  either  on  humans  or  on  animals.  This  means  that  the  critical  data  that  we  all  need  have  to  come 
from  very  carefully  conducted  experiments,  those  that  have  a  high  benefit  to  risk  (or  cost)  ratio.  On  the  one  hand,  we  need 
large  numbers  of  human  subjects  or  animals  to  make  our  results  statistically  reliable;  on  the  other  hand,  we  need  to  conserve 
and  protect  humans  and  animals  involved  in  research.  This  puts  all  ofthe  pressure  an  the  experimenter  to  obtain  only  those 
pieces  of  information  that  are  absolutely  vital  and  then  to  integrate  the  fragments  in  the  most  clever  ways.  I  hope  that  our 
research  will  show  that  trend. 

From  a  publication  standpoint,  this  5th  Status  and  Progress  Report  has  been  arranged  in  a  two-column  format  to 
be  a  little  more  compatible  with  typical  journal  papers.  I  am  extremely  proud  ofourstaffhere  at  the  Iowaoffice  for  spending 
time  and  effort  to  make  the  report  readable  and  appealing  to  you  all.  Special  thanks  go  to  Julie  Lemke,  Julie  Ostrem,  and 
Marty  Milder  who  have  contributed  substantively  to  the  success  of  the  reports. 

Ingo  R.  Titze,  Director 
November,  1993 
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Levator  Veli  Palatini  Muscle  Activity  in 
Relation  to  Intraoral  Air  Pressure  Variation 

David  P.Kneh^PhJX 

Department  of  Speech  and  Hearing  Sciences,  The  University  of  Illinois  at  Urbana-Champaign 

Jerald  B.  Moon,  PU). 

Department  of  Speech  Pathology  mid  Audiology,  The  University  of  Iowa 


Abstract 

The  purpose  of  this  investigation  was  to  study  tbe 
operating  range  of  the  levator  veli  palatini  muscle  for  a 
nonspeech  task  (blowing)  aid  to  determine  wherein  that 
range  levator  activity  for  speech  lies.  Ten  adult  subjects 
without  speech  or  velopharyngeal  abnormalities  partici¬ 
pated.  Levator  EMG  activity  for  speech  occurred  in  the 
lower  region  of  the  total  range  for  blowing.  In  two 
subsequent  experiments  involving  a  subset  of  four  subjects, 
it  was  found  that  overall  effort  may  have  had  a  small  effect 
on  levator  activity  apart  from  its  role  in  velopharyngeal 
closure  for  aerodynamic  purposes.  The  results  of  the  main 
experiment  are  discussed  in  relation  to  the  concept  of 
threshold  of  fatigue  as  it  may  influence  velopharyngeal 
control  mechanisms. 


Certain  muscles  that  are  used  for  speech  produc¬ 
tion  appear  to  have  the  capacity  to  produce  forces  that  far 
exceed  those  necessary  or  typically  used  for  speech.  Ex¬ 
amples  of  such  muscles  include  the  masseter  and  temporalis 
used  for  jaw  elevation  (Folkins,  1981).  It  has  been  reported 
that  levels  of  lip  muscle  force  used  for  speech  are  only  about 
10-20%  of  maximum  Up  forces  attainable  (Barlow  &  Abbs, 
1983).  With  regard  to  respiratory  activity  afforded  by  the 
powerful  abdominal  and  chest  wall  muscles,  humans  are 
capable  of  generating  airway  pressures  much  greater  than 
those  typically  used  for  speech.  Cook,  Mead,  and  Orzalesi 
(1964)  found  that  a  group  of  normal  young  adult  females 
and  males  generated  average  maximum  pulmonary  pres¬ 
sures  of  146  and  237  cm  H,0  respectively.  Kent,  Kent,  and 
Rosenbek  (1987)  reviewed  additional  studies  reporting 
similar  values.  These  values  are  many  times  greater  than 


tbe  approximately  6-10  cm  H20  typically  achieved  during 
speech. 

Playing  a  wind  instrument  may  require  agreat  deal 
mote  intraoral  air  pressure  than  that  for  speech.  Boubeys 
(1977)  indicated  that  trumpet  players  often  expend  up  to 
ISO  cm  11,0  when  playing  tones  that  are  high  in  frequency 
and  intensity.  Bless,  Ewanowski,  and  Dibble  (1983)  re¬ 
ported  air  leakage  through  the  velopharyngeal  port  for  10 
subjects  playing  wind  instruments  who  sought  help  for  this 
problem.  Only  three  of  the  subjects  exhibited  air  leakage 
for  speech,  although  only  after  extended  practice  on  their 
musical  instruments.  It  appears  that  in  the  subjects  de¬ 
scribed  by  Bless  et  aL  (1983),  velopharyngeal  muscle 
activity  was  sufficient  for  the  pressure  demands  of  speech 
but  not  for  the  higher  pressure  demands  associated  with 
playing  a  musical  instrument.  It  is  possible  that  for  certain 
individuals,  such  as  those  with  a  history  of  cleft  palate, 
muscle  strength  may  not  even  be  sufficient  to  meet  tbe 
pressure  demands  for  speech. 

In  discussing  maximum  performance  tests  of 
speech  production,  Kent  et  al.  (1987)  noted  that  a  reduced 
reserve  capacity  can  impair  a  speaker’s  “flexibility”  and 
might  also  render  speech  a  taxing  activity  for  that  indi¬ 
vidual.  Although  tbe  authors  discussed  global  activities 
such  as  maximum  expiratory  pressures  without  addressing 
underlying  anatomic  and  physiologic  mechanisms,  it  is 
likely  that  considerations  of  prevailing  level  of  activity  in 
relation  to  reserve  capacity  apply  to  individual  muscles  as 
well  as  whole  systems.  That  is,  using  a  muscle  at  or  near  its 
maximum  activation  level  for  speech  may  be  taxing  for  that 
individual  muscle  and  may  render  movement  of  the  struc¬ 
ture,  of  which  the  muscle  is  a  part,  an  unduly  effortful 
process. 
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It  is  noc  known  where  in  its  total  operating  range 
each  individual  muscle  that  influences  the  airway  functions 
during  speech.  We  were  interested  in  studying  the  operat¬ 
ing  range  for  the  levator  veli  palatini  muscle  and  to  deter¬ 
mine  wherein  that  range  its  activity  for  speech  lies.  We 
were  motivated  by  a  therapy  procedure  that  we  are  utilizing 
to  strengthen  the  muscles  of  velopharyngeal  closure  (Kuehn, 
1991). 

The  therapy  procedure  described  by  Kuehn  ( 199 1 ) 
involves  introducing  a  positive  air  pressure  into  the  nasal 
cavities  using  a  method  commonly  referred  to  as  CPAP 
(continuous  positive  airway  pressure)  that  is  used  to  treat 
patients  with  sleep  apnea.  The  positive  air  pressure  pro¬ 
vides  a  resistance  against  which  the  velopharyngeal  closure 
muscles  must  work.  We  have  shown  in  a  previous  study 
(Kuehn,  Moon,  and  Folkins,  1993),  that  levator  muscle 
activity  increases  with  heightened  nasal  cavity  air  pressure 
in  subjects  with  and  without  cleft  palate.  However,  we  did 
not  determine  in  that  study  the  level  of  levator  activity  for 
speech  in  relation  to  its  total  range  for  either  group  of 
subjects. 

Given  that  blowing  requires  an  airtight 
velopharyngeal  seal  regardless  of  the  level  of  intraoroal  air 
pressure  generated,  we  chose  that  task  in  an  attempt  to 
activate  the  levator  muscle  over  its  widest  operating  range. 
The  data  from  normal  subjects  pertaining  to  levator  operat¬ 
ing  ranges  obtained  in  the  current  study  will  be  used  as  a 
basis  for  comparison  with  similar  information  obtained 
from  individuals  with  velopharyngeal  abnormalities  in  a 
follow-up  study. 

Three  experiments  involving  normal  subjects  were 
designed  to  answer  the  following  questions: 

1)  What  is  the  activity  range  for  the  levator  veli 
palatini  muscle  for  a  nonspeech  task  involving  blowing? 

2)  What  is  the  activity  range  for  the  levator  veli 
palatini  muscle  for  speech  in  relation  to  the  range  for  the 
blowing  task? 

3)  What  is  the  activity  range  for  two  control 
muscles,  the  masse  ter  and  stemocleidomastoideus,  during 
the  blowing  task? 

4)  What  is  the  activity  range  for  the  levator  veli 
palatini  and  one  control  muscle,  the  stemocleidomastoideus, 
during  a  speech  loudness  task? 

Method 

Subjects 

The  subjects  for  Experiment  1  were  five  men  and 
five  women  in  the  third,  fourth,  or  fifth  age  decade.  The 
subjects  exhibited  normal  oral/nasal  resonance  balance  and 
reported  no  history  of  speech,  language,  or  hearing  disor¬ 
ders.  This  was  verified  by  the  investigators  at  the  time  of 
testing.  Four  of  the  subjects,  two  male  (S 1  and  S2)  and  two 
female  (S3  and  S4),  participated  in  a  second  and  third 
experiment  in  addition  to  Experiment  1. 


Experiment  1 

The  tasks  for  Experiment  1  are  summarized  in 
Table  1  and  identified  more  explicitly  in  Table  2.  Both  the 
speech  and  nonspeech  tasks  were  intended  to  elicit  a  range 
of  activity  for  the  levator  veli  palatini  muscle,  from  minimal 
to  maximal  levels.  Subjects  produced  10  repetitions  of  the 
speech  and  blowing  tasks  and  several  repetitions  of  the 
voluntary  velar  elevation  and  swallowing  activities. 

The  subject  was  seated  upright  in  a  dental  chair 
and  the  oral  cavity  was  sprayed  lightly  with  4%  lidocaine 
topical  anesthetic.  Stainless  steel  wire  electrodes,  1 10  |im 
in  diameter,  were  used  for  recording  EMG  activity  from  the 
levator  muscle.  The  wires  were  inserted  penxally  into  the 
muscle  using  1/2  inch  30  gauge  hypodermic  needles.  The 
needles  were  inserted  at  an  angle  following  the  course  of  the 
levator  muscle,  that  is,  in  a  superior,  lateral,  and  posterior 
direction.  The  two  wires  for  bipolar  recording  were  placed 
approximately  4  mm  apart  and  10  mm  deep  into  the  levator 
muscle  on  the  subject’s  right  side.  Placement  criteria 
included  EMG  activity  that  was  observed  in  association 
with  sustained  [s]  production.  The  EMG  signals  were 
amplified  using  Biocommunications  Electronics  preampli¬ 
fiers  (model  301)  and  amplifiers  (model  205). 

During  the  blowing  tasks,  a  segment  of  a  polyeth¬ 
ylene  tube,  13  cm  in  length  with  a  1 .77  mm  inner  diameter 
and  2.80  mm  outer  diameter,  was  inserted  into  the  mouth  to 
serve  as  a  shunt  in  parallel  with  the  pressure-sensing  tube  of 
the  same  diameters.  The  shunt  tube  enabled  constant  and 
predetermined  oral  pressures  to  be  generated.  Intraoral  air 
pressure  was  sensed  with  a  Honeywell  Microswitch  pres¬ 
sure  transducer  (model  162PC01D),  amplified  with  a 
Biocommunications  Electronics  amplifier  (model  205), 


Tablet. 

Sumnury  of  tula  for  Experiment  1. 

Tula  ate  identified  by  letter  in  Table  2. 

SPEECH 

“Say _ again." 

[ml  (sis) 

[mam)  [sui] 

[mi  ml  [pXj 

NONSPEECH 

1.  Blowing:  Intraoral  air  pressure  values  in  cm  H,0 

5  40 

10  50 

20  60 
30  mu 

2.  Voluntary  Velar  Elevation 

3.  Swallowing  on  Command 
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Tahiti. 

Key  for  ideadficatioa  at  tuki  ia  Ezpenmeac  1  for  which 
Itvnor  w>  paiattai  muacto  activity  wa  meaauwd.  Laden  m 
rrprraral  the  U  rpeech  uaka  measured.  Laden  m  repreeeat 
the  Mowiag  taeka,  led er  A  is  the  voiuetary  velar  alevatioa  (aak. 
aad  B  is  the  swaliowiag  task. 


SPEECH 

Peak  activity  for  (s)  ia  “say"  ia: 

a)  say  [m]  again  d)  say  [sis]  agaia 

b)  say  [mam]  again  e)  say  [sus]  again 

c)  say  [mim]  again  0  say  [pX]  again 

Prevailing  activity  level  for  the  utterance: 

g)  ...  [m] ...  j)  ...  [sis]  _. 

b)  —  [main]  _.  k)  ...  [sus]  ... 
i)  „  (mini) ...  0  ...  [pX]  _. 

Peak  activity  for  [g]  ia  “again"  ia: 

m)  say  [m]  again  p)  say  [sis]  again 

n)  say  [mam]  again  q)  say  [sus]  again 

o)  say  [mim]  again  r)  say  [pX]  again 

NONSPEECH 

Blowing.  Intnoni  air  pressures  in  cm  H,0 

s)  5  w)  40 

t)  10  x)  50 

u)  20  y)  60 

v)  30  x)  max 

A.  Voluntary  velar  elevation  on  command. 

B.  Swallowing  water  from  a  cup. 


and  displayed  on  one  channel  of  an  oscilloscope  (Tektronix 
model  2214). 

The  subject  was  shown  the  horizontal  axis  on  the 
oscilloscope  for  which  each  target  pressure  coincided.  He 
or  she  was  instructed  to  maintain  the  target  pressure  by 
keeping  the  oscilloscope  beam  on  the  appropriate  horizon¬ 
tal  axis  while  Mowing  on  the  shunt  tube.  Target  pressures 
were  elicited  in  the  order  10, 5, 20, 30, 40,  SO,  60  cm  HjO, 
and  the  maximum  pressure  that  the  subject  could  generate. 
The  maximum  pressure  varied  across  subjects  and  was  not 
controlled. 

After  the  blowing  tasks,  subjects  produced  the 
speech  tasks  listed  in  Tables  1  and  2.  The  speech  samples 

were  produced  in  the  carrier  phrase  “say _ again.” 

Sequencing  of  the  speech  tasks  was  randomized  across 
subjects.  The  audio  signal  from  a  dynamic  microphone  was 
amplified  using  a  Nakamichi  preamplifier  and  Tascam  tape 
recorder  (model  22-4). 

Subjects  also  were  asked  to  swallow  water  from  a 
cup  at  various  times  throughout  the  experiment  The 
experiment  concluded  by  eliciting  each  subject’s  voluntary 
velar  elevations.  Subjects  were  provided  with  audio  feed¬ 


back  of  the  amplified  levator  veli  palatini  electromyo¬ 
graphic  interference  patterns  to  assist  them  in  voluntary 
velar  elevations.  Two  (S8  and  S 10)  of  the  ten  subjects  could 
not  perform  voluntary  velar  elevations  even  with  such 
feedback. 

Generalized  Physiologic  Effort 

A  subset  of  four  subjects  performed  twoadditional 
tasks  in  two  subsequent  experiments  to  estimate  the  pos¬ 
sible  effects  of  overall  effort  on  levator  muscle  activity 
apart  from  the  functional  demands  on  the  muscle  to  close 
the  velopharyngeal  orifice  for  aerodynamic  purposes.  It 
was  reasoned  tfaatoveraU  effort,  especially  at  higher  intraoral 
air  pressure  levels,  conceivably  could  elevate  activity  of 
muscles  in  the  bead  and  neck  region  even  if  those  muscles 
were  not  immediately  bordering  the  airway.  For  example, 
those  muscles  could  provide  a  stabilizing  force  that  might 
increase  activity  level  linearly  with  increases  in  intraoral  air 
pressure.  Thus,  we  wanted  to  account  for  the  possibility  that 
increases  in  levator  activity  might  be  due  predominantly  to 
overall  physiologic  effort  thereby  giving  an  otherwise  false 
impression  of  its  activity  in  relation  to  velopharyngeal 
closure  for  the  blowing  task.  Experiments  2  and  3  were 
conducted  for  that  purpose. 

Experiment  2 

Activation  levels  of  the  masseter  and 
stemocleidomastoideus  muscles  were  sampled.  The  mas¬ 
seter  was  chosen  because  it  can  move  the  jaw  during  speech 
although  it  is  not  an  “obligatory”  muscle.  That  is,  people 
can  produce  speech  with  the  jaw  immobilized  (Lindblom, 
Lubker,  &  Gay,  1979).  In  a  similar  fashion,  the 
stemocleidomastoideus  is  often  regarded  as  an  “accessory” 
muscle  of  inhalation  (Hixon,  1973)  and  may  therefore  assist 
in  respiratory  activity  for  speech,  but  generally  is  not 
regarded  as  “obligatory”  for  that  purpose.  Moreover, 
neither  muscle  immediately  borders  the  airway.  Therefore, 
they  were  felt  to  be  suitable  as  neutral  muscles  of  the  head 
and  neck  for  the  purpose  of  assessing  the  effects  of  the  range 
of  effort  associated  with  the  blowing  task. 

Pairs  of  surface  electrodes  (Beckman  Ag-AgCl  1 1 
mm  diameter  disks)  were  attached  with  adhesive  collars  to 
the  skin  overlying  the  masseter  and  stemocleidomastoideus 
muscles  of  each  of  the  four  subjects.  Proper  placement  of 
these  electrodes  was  assessed  by  having  the  subjects  clench 
the  teeth  for  masseter  and  to  turn  the  head  to  the  opposite 
side  for  stemocleidomastoideus. 

The  session  began  with  two  separate  recordings, 
first  with  the  subject  clenching  his  or  her  teeth  with  maxi¬ 
mum  effort  and  then  rotating  the  head  maximally  to  the  side 
opposite  the  stemocleidomastoideus  muscle  from  which 
the  recording  was  obtained.  This  provided  maximum  EMG 
activity  levels  for  the  two  muscles  against  which  the 
blowing  activities  could  be  compared.  The  subjects  then 
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blew  through  the  shunt  tube  as  in  Experiment  1  but  only  at 
target  values  in  the  sequence  10,30,50  can  H,0  and  maxi¬ 
mum  effort  Each  task  was  repeated  10  times.  All  instru¬ 
mentation  and  dam  collection  procedures  were  the  same  as 
that  for  Experiment  1. 

Experiments 

Experiment  3  involved  the  same  subset  of  four 
subjects.  These  subjects  produced  vowels  at  three  different 
loudness  levels.  Although  increased  loudness  is  associated 
with  greater  overall  effort  for  example  in  respiratory  drive, 
it  is  not  accompanied  by  greater  intraoral  air  pressure 
because  vowels  are  produced  with  an  un occluded  oral 
cavity.  Therefore,  any  increase  in  levator  veli  palatini 
activity  with  loudness  would  logically  be  attributable  to 
some  aspect  or  aspects  of  physiologic  effort  but  not  to 
increased  intraoral  air  pressure  demands. 

Two  muscles,  the  levator  and 
stemocleidomastoideus,  were  sampled.  The  latter  muscle 
was  included  again  as  a  neutral  muscle  as  described  for 
Experiment  2.  Hooked  wire  electrodes  were  inserted  into 
the  levator  veil  palatini  muscle  on  each  subject’ s  right  side. 
Surface  electrodes  were  affixed  to  the  skin  overlying  the  left 
stemocleidomastoideus  muscle. 

The  subjects  sustained  each  of  the  vowels  [i,a,u]  at 
a  normal  loudness  level,  louder  than  normal,  and  at  their 
loudest  level.  They  were  instructed  to  produce  the  loudest 
vowels  that  they  could,  but  without  pain  or  strain.  The 
sequence  of  the  nine  tasks  (3  vowels  X  3  loudness  levels) 
was  random  across  subjects. 

To  ensure  that  the  subjects  were  in  fact  increasin  g 
their  output  level  across  loudness  conditions,  a  sound 
pressure  level  meter  (Bruel  &  Kjaer,  2209,  set  to  the  A 
scale)  was  used  to  measure  the  level  of  their  vowel  produc¬ 
tions.  The  meter  microphone  was  placed  2  1/2  ft  from  the 
subject’s  mouth.  All  other  instrumentation  and  data  collec¬ 
tion  procedures  were  the  same  as  that  for  Experiments  1  and 
2. 


Experiment  1 

Figure  1  shows  an  example  of  rectified  and 
smoothed  levator  veli  palatini  EMG  activity  and  intraoral 
air  pressure  trace  during  a  blowing  task.  A  similar  EMG 
trace  was  observed  during  voluntary  elevations.  For  each 
of  these  tasks,  average  EMG  activity  withina  1-sec  segment 
characterized  by  relatively  stable  EMG  was  chosen  to 
represent  prevailing  levator  EMG  activity. 

An  example  of  measures  obtained  from  a  speech 
task  is  presented  in  Figure  2.  Three  measures  of  activation 
level  for  levator  were  obtained:  1)  peak  level  for  [s]  in  the 
carrier  word  “say,”  2)  prevailing  level  during  the  target 
utterance,  and  3)  peak  level  for  [g]  in  the  carrier  word 
“again.”  For  swallowing,  peak  activation  levels  were 
recorded. 

Levator  EMG  activation  levels  were  normalized 
within  each  subject  The  largest  peak  EMG  value  recorded 
during  the  blowing  tasks  by  a  subject  was  used  as  a  reference 
for  that  subject  and  was  assigned  a  value  of  100%.  All  other 
EMG  values  recorded  for  that  subject  were  referenced  to  the 
maximum  value. 
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Figurt  1.  Rectified  and  smoothed  EMG  activity  from  the  levator  veli 
palatini  muscle  (top)  and  intraoral  air  pressure  trace  ( bottom )  for  a 
blowing  task.  An  interval  of  IjO  sec  in  the  mid  portion  of each  token  of  the 
blowing  tasks  was  scsnpled  and  analyzed  as  a  representative  measure  of 
EMG  activity  for  that  token. 


Data  Analysis 

EMG  activity,  intraoral  air  pressure,  and  audio 
signals  were  monitored  on  an  oscilloscope  (Tektronix 
model  51 11  A)  and  recorded  on  a  Sony  digital  instrumenta¬ 
tion  recorder  (model  PC108M).  Subsequently,  EMG  sig¬ 
nals  were  full-wave  rectified  and  smoothed  with  a  40  ms 
time  constant  Intraoral  air  pressure  signals  also  were 
smoothed  with  a  40  ms  time  constant  Rectified  and 
smoothed  EMG  signals,  smoothed  pressure  signals,  and  the 
audio  signal  were  digitized  at  1000  samples  per  second 
using  a  laboratory  computer  and  commercially  available 
analog-to-digital  conversion  software.  Data  then  were 
displayed  and  analyzed  using  custom  graphics  and  analysis 
routines. 
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Figure  2.  Audio  trace  (top)atd  rectified  and  smoothed  EMG  activity  from 
the  levator  veli  palatini  muscle  (bottom).  PI  =  peak  EMG  for  [s]  in  the 
carrier  word  "say. "  P2  =  prevailing  EMG  level  during  the  target  word  in 
the  carrier  phrase.  P3  =  peak  EMG  for  [g]  in  the  carrier  word  “again." 
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A  mixed  model  analysis  of  variance  with  one 
random  factor  (subjects)  and  one  fixed  factor  (task)  was 
used  to  assess  the  effects  of  the  various  speech  and  nonspeech 
tasks  on  levels  of  levator  muscle  activity.  Designed  a  priori 
comparisons  were  made  between  the  blowing,  speech, 
voluntary  elevation,  and  swallowing  tasks.  The  mean 
differences  among  these  task  groups  were  estimated  and 
tested  against  zero  using  an  alpha  level  of  0.05.  Finally, 
multiple  comparisons  among  the  eight  blowing  tasks,  18 
speech  tasks,  voluntary  elevation,  and  swallowing  were 
performed  using  the  Scheffe  procedure. 

Experiment  2 

Data  analysis  for  Experiment  2  was  similar  to  that 
employedforExperimentl.  Normalization  within  subjects 
was  conducted  separately  for  each  muscle  (masse ter  and 
steroocleidomastoideus) .  Activation  levels  recorded  dur¬ 
ing  maximal  effort  tasks  (teeth  clenching  for  masse  ter  and 
head  turning  for  sternocleidomastoideus),  were  used  as 
reference  levels.  AsinExperimentl,  average  EMGactivity 
within  a  1-sec  segment  characterized  by  relatively  stable 
EMG  activity  was  recorded.  In  instances  for  which  no 
readily  identifiable  EMG  activity  could  be  detected,  a  1  -sec 
segment  characterized  by  stable  intraoral  air  pressure  was 
chosen. 

A  mixed  model  analysis  of  variance  with  one 
random  factor  (subjects)  and  one  fixed  factor  (pressure 
level)  was  used  to  assess  the  effects  of  pressure  level  on 
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Figure  3.  Levator  veil  palatini  EMG  activity  (95%  confidence  intervals) for 
speech  and  nonspeech  tasks;  male  subject!.  Tasks  are  identified  by  letter 
in  Table  2.  •  Subject  5  activity  for  swallowing  exceeded  200%  of  that  for 
his  overall  peak  blowing  activity 


muscle  activation  level.  Separate  analyses  were  conducted 
for  the  masseter  and  steroocleidomastoideus  muscles.  Post 
hoc  analyses  involved  Bonferroni  multiple  comparisons. 

Experiment  3 

For  each  vowel  prolongation,  average  activation 
levels  were  measured  for  1-sec  segments  characterized  by 
relatively  stable  levator  and  steroocleidomastoideus  EMG 
activity.  Maximal  activation  tasks  were  not  recorded  for 
these  muscles  in  this  experiment  Therefore,  EMG  values 
recorded  in  Experiment  3  were  not  normalized  and  are 
reported  in  arbitrary  units. 

A  mixed  model  analysis  of  variance  was  used  to 
assess  the  effects  of  vowel  loudness  on  activation  levels  of 
the  levator  and  steroocleidomastoideus  muscles.  A  sepa¬ 
rate  analysis  was  conducted  for  each  vowel.  The  model 
included  vowel  and  loudness- within- vowel  as  fixed  factors. 
Random  factors  included  subjects,  subject-by-vowel,  and 
subject-by-loudness-within-vowel  interactions.  Bonferroni 
multiple  comparisons  were  used  to  assess  muscle  activation 
differences  across  loudness  levels  within  each  vowel. 

Results 

Experiment  1 

Figures  3  and  4  show  95%  confidence  intervals 
representing  levator  veli  palatini  muscle  activation  levels 
for  the  speech  and  nonspeech  tasks  for  the  10  subjects. 
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Figured.  Levalorveli palatini  EMGactivity  (95%  confidence  intervals)  for 
speech  and  nonspeech  tasks;  female  subjects.  Tasks  are  identified  by  letter 
in  Table  2. 
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Speech  tasks  are  identified  by  the  letters  a-r,  blowing  by  the 
letters  s-z,  voluntary  velar  elevation  by  the  letter  A,  and 
swallowing  by  tbe  letter  B  (see  Table  2). 

EMG  measures  for  tbe  speech  tasks  are  arranged 
according  to  increasing  levels  of  levator  EMG  activity 
within  subjects.  Although  tbe  exact  sequence  varied  across 
subjects,  the  nasal  contexts  are  generally  nearer  the  lower 
end  and  the  stop  and  fricative  contexts  are  generally  at  the 
higher  aid  of  the  activity  range  for  speech  as  expected. 

The  levator  activity  range  across  the  speech  tasks 
for  each  subject  appears  to  be  rather  continuously  variable 
without  obvious  discontinuities  in  the  function  for  most 
subjects.  Possible  exceptions  to  this  general  statement 
might  be  for  Subjects  1  and  S.  Tbe  speech  function  for 
Subject  1  appears  to  be  somewhat  trimodal  with  the  nasal 
consonant  [m]  (represented  by  the  letter  g  in  Figure  3)  at  the 
lowest  level,  vowels  within  tbe  nasal  context  (letters  h  and 
i)  somewhat  higher,  and  all  of  the  other  sounds  higher  still. 
Subject  S  shows  an  abrupt  change  in  levator  EMG  activity 
for  tbe  nasal  consonant  (letter  g)  versus  all  other  speech 
sounds  for  which  levator  activity  is  higher.  Although  not  a 
major  focus  of  this  study,  the  data  for  speech  in  Figures  3 
and  4  are  more  consistent  with  a  velopharyngeal  mecha¬ 
nism  that  is  under  continuously  variable  control  (Kent, 
Carney,  &  Severeid,  1974;  Lubker,  197S)  rather  than  binary 
control  (Moll  &  Daniloff,  1971;  Moll  &  Shriner,  1967). 

For  the  blowing  task,  levator  activity  increased  in 
a  monotonic  fashion  for  most  subjects  from  the  lowest 
intraoral  air  pressure  generated  (represented  by  letter  s  in 
Figures  3  and  4)  to  the  highest  intraoral  air  pressure  (letter 
z).  Table  3  shows  the  results  of  the  analysis  of  variance 
canparing  blowing  tasks  for  tbe  data  grouped  across  sub¬ 
jects.  For  most  comparisons,  an  increase  in  intraoral 
pressure  was  associated  with  a  significant  (p  <  0.05) 
increase  in  levator  activity.  For  example,  levator  activity 
was  significantly  greater  at  20  cm  H20  compared  to  5  cm 
HjO,  greater  at  30  cm  H20  compared  to  10  cm  H20,  etc. 

Overall,  significantly  (p  <  0.0001)  greater  levels 
of  levator  EMG  activity  were  observed  during  blowing  than 


Tabic  3. 

Recults  of  analysis  of  variance  comparing  levator  muscle 
activity  for  the  blowing  tasks.  *  p  <  0.05.  s-z  =  intraoral 
airpressure  values  in  cm  f^OatS,  10,20,30,40,50,60, 
maximum  respectively. 
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during  speech.  Table  4  shows  the  results  of  the  individual 
comparisons  for  blowing  versus  speech  tasks.  Across 
subjects,  levator  activity  was  significantly  (p  <  0.05)  greater 
for  blowing  tasks  at  and  above  20  cm  H,0  compared  to  all 
18  speech  tasks.  For  the  blowing  tasks  at  5  and  10  cm  H,0, 
levator  activity  was  significantly  (p  <  0.05 )  greater  than  that 
for  speech  for  5  and  6,  respectively,  of  the  18  speech  tasks. 


Tabic  4. 

Results  at  analysis  of  variance  comparing  levator  muscle 
activity  for  Mowing  tasks  to  that  for  speech  teaks  across  subjects. 
Ratios  in  right  columa  indicate  number  of  significant  differences 
(p  <  0.05)  for  each  blowing  task  oompared  to  the  18  speech  tasks. 
Blowing  tasks  am  expressed  in  cm  H20  of  immoral  air  pressure. 
Speech  teaks  are  identified  in  Table  Z 

Blowing  Task  Significant  versus  Speech  Tasks 
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Figures  3  and  4  also  show  the  95%  confidence 
intervals  for  levator  activity  associated  with  swallowing 
and  voluntary  velar  elevation  compared  to  that  for  the 
speech  and  blowing  tasks.  Across  subjects,  significantly  (p 
<  0.0001)  greater  levels  of  levator  EMG  were  observed 
during  swallowing  than  during  all  three  of  the  other  tasks, 
speech,  blowing,  and  voluntary  velar  elevation.  However, 
this  effect  was  due  primarily  to  the  much  greater  levels 
associated  with  swallowing  for  S 1  and  S5.  Levator  activity 
levels  for  swallowing  were  less  than  that  for  blowing  for 
several  of  the  other  subjects.  Overall,  significantly  (p  < 
0.0001)  less  levator  activity  was  observed  during  voluntary 
velar  elevation  than  during  blowing,  but  there  was  not  a 
significant  difference  between  voluntary  elevation  and 
speech. 

Experiment! 

Figure  5  provides  information  about  whether  ef¬ 
fort  level  in  the  subset  of  four  subjects  had  an  affect  on 
muscles  in  tbe  head  and  neck  region  that  do  not  directly 
border  tbe  vocal  tract  Tbe  figure  shows  normalized  mean 
levels  of  EMG  activity  for  tbe  masseter  and 
stemocleidomastoideus  muscles  as  a  function  of  intraoral 
air  pressures  generated  in  the  blowing  tasks.  The  EMG 
levels  are  expressed  as  percentages  of  tbe  maximum  level 
of  activity  within  each  muscle  as  determined  by  maximal 
teeth  clenching  and  head  turning  maneuvers. 
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Figure  5.  Mean  and  standard  deviation  values  for  masseter  and 
stemocUdomastoideus  (SCM)  EMC  activity  levels  versus  intraoral  air 
pressure  levels. 


Figured.  Mem  and  standard  deviation  values for  sound  pressure  level  in 
dB  (A  scale)  versus  subjective  loudness  levels  for  the  three  vowels  li.a.uj. 


For  the  masseter  muscle,  comfr~i«ons  for  EMG 
activity  between  any  two  intraoral  air  pressure  conditions 
were  significantly  (p  <  0.0001)  different  from  each  other. 
For  the  stemocleimastoideus  muscle,  four  of  the  six  paired 
comparisons  between  pressure  conditions  were  signifi¬ 
cantly  (p  <  0.008)  different  from  each  other. 
Sternocleidomastoideus  muscle  activity  for  comparisons 
between  30  and  50  cm  H,0  versus  maihnum  intraoral  air 
pressure  were  not  significantly  different  Thus,  most  of  the 
paired  comparisons  across  pressure  conditions  for  the  two 
muscles  were  significantly  different  from  each  other  sug¬ 
gesting  a  general  increase  in  muscle  activity  with  increased 
blowing  effort  However,  relative  levels  of  activity  re¬ 
mained  at  a  low  level  (below  25%  of  maximum)  for  both 
muscles  even  at  the  highest  levels  of  intraoral  air  pressures 
generated 

Experiment3 

Figures  6  and  7  show  the  results  of  the  vowel 
loudness  experiment  for  the  subset  of  four  subjects.  Figure 
6  shows  that  the  subjects  did  increase  sound  pressure  levels 
in  association  with  their  subjective  increases  in  vowel 
loudness.  All  comparisons  across  loudness  levels  were 
significantly  (p  <  0.0001)  different  from  each  other. 

EMG  activity  levels  for  the  levator  muscle  are 
shown  in  Fi gure7A  and  those  for  thesternocleidomastoideus 
muscle  are  shown  in  Figure  7B  in  relation  to  the  loudness 
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Figure  7.  Mem  and  standard  deviation  values  for  EMG  activity  level  in 
arbitrary  units  versus  subjective  loudness  levels  for  the  three  vowels 
[ia,uj.  A.  Levator  veil  palatini  muscle.  B.  Sternocleidomastoideus 
muscle. 


changes.  Across  loudness  levels  for  the  levator  muscle, 
wily  the  comparison  involving  the  vowel  [a]  for  the  normal 
versus  the  loudest  production  reached  statistical  signifi¬ 
cance  (p  <  0.006).  For  the  sternocleidomastoideus  muscle, 
none  of  the  comparisons  across  loudness  levels  reached 
statistical  significance  for  any  of  the  three  vowels.  Thus,  in 
general,  increases  in  loudness  did  not  have  a  strong  effect 
on  activation  levels  of  the  two  muscles  examined. 


Discussion 

The  primary  purpose  of  this  study  was  to  obtain 
information  about  the  range  of  levator  veli  palatini  muscle 
activity  in  normal  speakers  and  to  determine  wherein  that 
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range  the  activity  for  speech  lies.  It  was  assumed  that 
utilizing  a  noospeech  task  such  as  blowing,  which  requires 
tight  velopharyngeal  closure,  would  activate  the  levator 
muscle  over  its  widest  range.  Although  we  were  primarily 
interested  in  levator  activity  related  to  velopharyngeal 
functioning,  we  wanted  to  account  for  the  possibility  that 
overall  physiologic  effort  might  influence  levator  activity 
apart  from  its  more  direct  rote  in  relation  to  the  control  of 
intraoral  air  pressure  and  the  direction  of  the  airstream 
during  speech.  Experiments  2  and  3  were  conducted  for  that 
purpose. 

The  results  of  Experiments  2  and  3  suggest  that 
overall  effort  may  have  had  some  small  effect  on  levator 
activity  apart  from  aerodynamic  demands.  This  follows 
from  the  increases  observed  in  the  masse  ter  and 
sternocleidomastoideus  muscles  with  increases  in  intraoral 
air  pressure  in  Experiment  2.  However,  the  activity  ob¬ 
served  in  these  muscles  during  the  blowing  task  was  of  a 
very  low  magnitude.  Moreover,  increases  in  loudness  did 
not  have  a  strong  effect  on  activation  levels  of  either  the 
levator  or  the  sternocleidomastoideus  muscles  in  Experi¬ 
ment  3.  Therefore,  we  conclude  that  the  variability  attrib¬ 
utable  to  overall  effort  is  minimal  and  that  most  of  the 
variance  in  levator  activity  is  related  directly  to  its  role  in 
providing  closure  of  the  velopharyngeal  port. 

A  major  finding  in  this  study  was  that  levator 
muscle  activity  for  speech  tended  to  occur  in  the  lower 
region  of  the  total  range  for  blowing.  Across  subjects,  all 
levator  activity  levels  for  blowing  at  or  above  20  cm  H20 
intraoral  air  pressure  were  greater  than  levator  activity 
levels  observed  during  speech  tasks.  These  results  are 
interesting  in  view  of  the  fact  that  intraoral  air  pressure 
needs  for  normal  conversational  speech  are  generally  below 
20  cm  HjO. 

These  results  may  have  different  explanations 
depending  on  the  type  of  neuromuscular  control  acting  on 
the  velopharyngeal  mechanism.  Forexample,  the  increases 
in  levator  activity  in  the  blowing  task  could  be  related  to 
reflexive  activity.  In  this  fashion,  following  initial  velar 
elevation,  the  levator  muscle  could  be  functioning  in  a 
largely  reactive  manner.  This  appears  unlikely,  however, 
because  levator  activity  rose  at  the  onset  rather  than  follow¬ 
ing  intraoral  air  pressure  changes  and  remained  fairly 
constant  throughout  each  individual  blowing  maneuver 
(see  Figure  1).  Therefore,  although  we  cannot  be  certain 
about  a  cause-effect  relation,  it  appears  more  likely  that 
increases  in  levator  activity  in  relation  to  blowing  with 
different  levels  of  intraoral  air  pressure  are  planned  by  the 
motor  mechanism  and  do  not  rely  on  reflexive  control. 

Regardless  of  whether  reflexive  control  or  other 
more  automatic  peripheral  adjustments  occurred,  the  leva¬ 
tor  activity  levels  for  speech  in  relation  to  the  total  range  for 
blowing  suggests  a  relatively  low  effort  cm  the  part  of  the 
levator  muscle  during  speech.  This  is  consistent  with  the 


general  concept  that  normal  speech  does  not  require  a  great 
deal  of  effort  Yet  it  is  possible  that  because  speech  and 
nonspeech  tasks  are  qualitatively  different  they  draw  upon 
different  neuromuscular  control  mechanisms.  Thus,  there 
could  be  different  maxima  for  levator  activity  for  different 
tasks. 

For  nonrepetive,  nons attained  activity  such  as 
swallowing,  maximal  activation  of  the  levator  muscle  may 
be  a  reasonable  strategy,  as  observed  in  some  subjects  in  this 
study,  to  ensure  the  tightest  velopharyngeal  closure  for  each 
swallow  to  prevent  nasal  regurgitation.  However,  for 
repetive  activity  such  as  speech,  or  sustained  activity  such 
as  blowing,  functioning  at  maximum  level  would  appear  to 
be  a  poor  strategy  because  of  the  likelihood  of  fatigue. 
Although  the  distribution  of  muscle  fiber  types  in  the 
normal  human  adult  velopharyngeal  muscles  is  not  known, 
it  is  likely  that  there  is  amixture  of  Type  I  and  Type  Q  fibers 
present  and  the  muscles  would  be  susceptible  to  fatigue 
owing  to  the  Type  II  fibers  (Johnson,  Polgar,  Weightman, 
&  Appleton,  1973).  Thus,  to  prevent  fatigue,  it  appears 
parsimonious  for  the  muscle  to  function  nearer  the  lower 
end  of  its  operating  range  for  repetitive  and  sustained 
activities.  Also,  it  is  possible  that  the  structure  that  is  being 
moved,  in  this  case  the  velum,  may  be  apt  to  reach  its 
intended  target  more  consistently  and  in  a  timelier  fashion 
than  if  the  underlying  muscle  is  overtaxed  to  the  level 
approaching  fatigue. 

Mundale  (1970),  in  a  study  involving  hand  grip 
strength  and  fatigue,  found  that  fatigue  (decrease  in  force 
generating  capacity)  was  clearly  evident  with  hand  grip 
maneuvers  at  20%  of  maximum  force  with  periods  of 
relaxation  alternating  with  periods  of  muscle  contraction. 
Above  20%  of  maximum  force,  the  duration  of  each 
intermittent  contraction  was  important  for  total  endurance 
(resistance  to  fatigue)  but  under  20%,  the  duration  of  each 
contraction  was  less  important 

Bystrom  and  Kilbom  (1990)  provided  additional 
information  about  the  interaction  between  “intensity”  (force 
generation)  and  muscle  contraction  time.  They  also  mea¬ 
sured  handgrip  force  and  included  EMG  recording  of  the 
extensor  digitorum  communis  as  one  index  of  handgrip 
fatigue.  They  defined  “local  fatigue”  on  the  basis  of 
combined  measures  ofblood  flow  in  the  forearm,  EMG.  and 
subjective  ratings.  They  found  that  at  continuous  contrac¬ 
tions  of  10%,  25%,  and  40%  of  maximum  voluntary 
contraction  (MVC),  local  fatigue  in  the  forearm  was  evi¬ 
dent  Intermittent  exercises  at  10%  MVC  with  2, 5,  or  10 
sec  of  relaxation  alternating  with  10  sec  of  contraction  did 
not  lead  to  fatigue,  nor  did  5  or  10  sec  of  relaxation 
alternating  with  10  sec  of  contraction  at  25%  MVC.  They 
found  the  threshold  of  fatigue  to  be  16.7%  MVC  which  was 
the  product  of  the  time  and  intensity  ratio.  For  example, 
with  contraction  time  of  7  sec  and  relaxation  of  3  sec  (7/ 
10=.7)  versus  an  intensity  of  20%  MVC,  the  product  of 
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these  values  equals  14%  (0.7  X  20%)  and  is  below  tbe 
fatigue  threshold  of  16.7%  in  their  study. 

Obviously,  the  exact  numbers  as  stated  above,  if 
they  are  indeed  valid,  would  vary  depending  on  the  muscles 
involved  and  probably  many  other  factors  as  well.  Robin, 
God,  Somodi,  &  Luscfad  (1992)  observed  a  difference  in 
endurance  for  tongue  pressure  against  an  air-filled  bulb  in 
trumpet  players  and  high  school  debaters  compared  to 
control  subjects  without  training  in  those  activities.  Sus¬ 
tained  tongue  pressures  were  significantly  longer  at  25% 
and  50%  of  maximum  pressure  for  the  experimental  sub¬ 
jects.  Robin  etal.  suggested  that  possible  exercise-related 
changes  in  the  proportion  of  fatigue- resistant  muscle  fibers 
brought  about  by  trumpet  playing  and  competitive  debate 
may  have  led  to  the  observed  differences  in  endurance 

liman 

The  notion  of  a  threshold  fatigue  effect,  depending 
on  both  time  and  force  in  relatioa  to  velopharyngeal  control, 
appears  worthy  of  exploration.  A  common  anecdotal  report 
by  clinicians  is  that  patients  often  have  tbe  capability  of 
producing  a  sustained  [s]  or  isolated  words  with  no  evidence 
of  velopharyngeal  incompetency  but  that  the  competency 
breaks  down  in  connected  speech,  especially  “casual” 
speech.  It  is  possible  that  such  individuals  may  not  neces¬ 
sarily  be  within  a  fatigue  state  but  rather  have  developed  a 
pattern  of  velopharygeal  control  to  avoid  a  fatigue  state  that 
might  occur  very  rapidly  in  tbe  presence  of  increased  or 
more  sustained  muscle  force  generation.  Perhaps  because 
of  weaker  velopharyngeal  muscles,  these  individuals  may 
have  a  lower  threshold  of  fatigue  than  individuals  having 
normal  strength  and  they  may  have  developed  a  pattern  of 
neuromotor  control  to  remain  below  the  fatigue  threshold. 
In  a  recent  study,  Warren,  Dalstoo,  and  Mayo  (1993) 
concluded  that  if  the  velopharyngeal  port  is  open  for  an 
inappropriately  long  time  interval  compared  to  normal, 
hypemasal  speech  is  likely  to  result  Their  results  and 
conclusion  fit  well  with  tbe  concept  expressed  here  with 
regard  to  duration  of  velopharyngeal  closure  as  affected  by 
a  fatigue  threshold.  That  is,  one  way  of  alleviating  fatigue 
.  in  velopharyngeal  closure  muscles  is  to  avoid  excessive 
opposition  to  gravity  and  other  forces  that  naturally  tend  to 
open  the  velopharyngeal  port 

Future  studies  are  needed  to  help  elucidate  the 
possible  beneficial  effects  of  resistance  exercises  such  as 
CPAP  therapy  (Kuehn,  1991)  and  other  therapeutic  tech¬ 
niques  designed  to  strengthen  the  velopharyngeal  muscula¬ 
ture  and  possibly  reduce  fatigue  effects.  In  speakers 
exhibiting  hypemasality,  a  reserve  capacity  for  the  levator 
muscle  may  exist  above  that  used  for  speech,  as  shown  for 
people  with  normal  speech  in  this  study.  It  may  be  possible 
for  such  individuals  to  tap  into  that  reserve  capacity  with 
proper  training  procedures  during  speech.  Also,  speakers 
exhibiting  hypemasality  may  have  lower  thresholds  of 


fatigue  because  of  weaker  velopharyngeal  closure  muscles 
and  other  factors.  It  may  be  possible  to  raise  the  threshold 
of  fatigue  thereby  utilizing  a  more  suitable  range  of  activity 
for  normal  speech  purposes. 

The  current  study  will  be  extended  using  subjects 
with  abnormal  velopharyngeal  mechanisms  to  determine 
wherein  their  total  operating  range  levator  activity  for 
speech  lies.  We  also  intend  to  explore  the  notion  of 
velopharyngeal  muscle  fatigue  thresholds  in  subject  groups 
with  normal  and  abnormal  velopharyngeal  mechanisms, 
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Abstract 

The  relative  contributions  of  the  levator  veli 
palatini,  palatoglossus,  and  palatopharyngeus  muscles  were 
assessed  relative  to  a  range  of  positions  of  the  velopharynx 
during  production  of  the  vowels  [a]  and  [i]  by  four  normal 
adult  speakers.  The  results  indicate  that  velopharyngeal 
positioning  is  determined  by  die  relative  contributions  of 
the  levator  veli  palatini,  palatoglossus,  and  palatopharyngeus 
muscles.  There  was  an  increase  in  coefficients  of  determi¬ 
nation  (i.e.  amount  of  closure  level  variability  explained) 
when  activity  levels  of  all  three  muscles  are  included  in  the 
statistical  model  compared  to  activity  in  any  one  muscle 
analyzed  independently.  Both  consistent  and  inconsistent 
relationships  among  activity  levels  in  the  three 
velopharyngeal  muscles  studied  were  observed  across 
speaker  and  vowel  produced. 


According  to  Bernstein  (1967),  a  given  motor  task 
can  be  performed  in  a  variety  of  ways.  A  change  in  activity 
of  one  structure  in  a  system  may  induce  variations  in  the 
activity  of  other  structures  to  accomplish  a  desired  task. 
The  motor  control  system  imposes  constraints  on  the  com¬ 
ponent  structures  to  simplify  the  control  process.  That  is, 
function  based  interaction  rules  are  established.  Fowler, 
Ruben,  Remez,  andTurvey  (1980)  refer  to  such  an  interac¬ 
tive  system  as  a  coordinative  structure.  In  speech,  the 


components  of  the  coordinative  structure  might  be 
articulators  (e.g.  the  lip  or  the  jaw)  or  muscles,  and  their 
actions  are  nested  within  the  overall  goal  of  perceptually 
adequate  speech  output. 

A  number  of  studies  have  addressed  motor  control 
of  the  speech  articulators  within  the  perspectives  ofBernstein 
(e.g.  Folkins  and  Canty,  1986;  Folkins,  Linvilk,  Garrett, 
and  Brown,  1988;  Graeco,  1988).  However,  with  the 
exception  of  a  theoretical  discussion  by  Folkins  (1985), 
control  of  the  velopharyngeal  mechanism  has  not  been 
considered  within  this  context.  Within  the  velopharyngeal 
mechanism,  a  coordinative  structure  might  be  indicated  by 
interactions  among  the  velum  and  lateral  pharyngeal  walls, 
or  interactions  among  activation  levels  of  the  velar  muscles. 
On  a  muscular  level,  the  levator  veli  palatini,  palatoglossus, 
and  palatopharyngeus  might  be  thought  of  as  a  coordinative 
structure.  While  levator  veli  palatini  muscle  activity  is 
associated  with  velar  elevation  gestures,  Fritzell  (1969) 
noted  that  the  extent  of  velar  elevauoo  and  magnitude  of 
levator  veli  palatini  muscle  activity  was  not  always  highly 
correlated.  Similarly,  Kuehn,  Folkins,  and  Cutting  (1982) 
found  in  a  study  of  oralized  vowels  and  fricatives  that 
ulevels  of  levator  muscle  activity  independent  of  other 
muscle  activity  were  not  directly  related  to  velar  position” 
(p.  30).  Kuehn  et  al.  suggested  that  a  trading  relationship 
might  exist  among  the  levator  veli  palatini,  palatoglossus, 
and  palatopharyngeus  muscles  in  positioning  the  velum. 
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They  postulated  Amber  that  various  combinations  of  activ- 
icy  is  these  three  muscle*  might  be  associated  with  the  same 
velar  position. 

Although  a  speaker  may  not  vary  the  relative 
combtawrtno  of  velar  and  lateral  pharyngeal  wall  movement 
during  multiple  repetitions  of  a  speech  task,  the  overall 
of  closure  used  to  produce  a  speech  sample  will 
often  vary  greatly  from  repetition  to  repetition.  Suppos¬ 
edly,  the  variation  in  velopharyngeal  opening  interacts  with 
changes  in  the  impedance  of  the  oral  cavity  to  produce 
desired  percepts  of  naaality  (Folkins,  1985).  There  is  also 
great  variability  in  the  size  mid  duration  of  the  bursts  of 
electromyographic  activity  recorded  from  the  velar  muscles 
during  speech.  Typically,  it  is  not  possible  to  distinguish 
which  aspects  of  this  variability  in  any  one  muscle  are 
related  to  variability  in  velar  opening  and  which  are  related 
to  interactions  among  muscles  to  produce  the  same  move¬ 
ments. 

Moon  and  Jooes  (1991)  have  shown  that  visual 
feedback  can  be  used  to  teach  speakers  to  control  and  vary 
the  amount  of  velopharyngeal  opening  used  to  produce  a 
speech  sample.  One  advantage  of  their  procedure  is  that  ooe 
can  ensure  that  a  range  of  velopharyngeal  openings  is 
systematically  studied.  By  manipulating  the  extent  of 
velopharyngeal  opening,  one  could  evaluate  the  combina¬ 
tions  of  velopharyngeal  muscle  activity  that  may  be  asso¬ 
ciated  with  a  given  velopharyngeal  opening  size.  While 
previous  investigators  have  studied  activation  levels  and 
ascribed  roles  to  individual  velopharyngeal  muscles,  there 
have  been  no  systematic  studies  of  interactions  between  the 
levator  veli  palatini,  palatoglossus,  and  palatopfaaryngeus 
muscles.  Such  information  is  important  to  our  understand¬ 
ing  ofbotfa  normal  and  ultimately  disordered  velopharyngeal 
function.  This  study  investigates  relative  contributions  of 
the  levatorveli  palatini,  palatoglossus,  and  palatopharyngeus 
muscles  in  positioning  of  the  velum  during  speech  produc¬ 
tion. 

Methods 

Subjects 

Four  young  adults,  three  females  and  one  male, 
served  as  subjects  for  this  investigation.  All  were  judged  by 
the  experimenters  to  have  normal  resonance  balance  and 
articulation.  None  reported  a  history  of  speech,  language, 
or  hearing  disorders.  One  subject  (subject  A)  had  been 
trained  in  singing. 

Phototransductioa 

Velopharyngeal  opening  and  dosing  gestures  were 
transduced  using  the  phototransducer  system  described  by 
Dalston  (1982).  A  description  of  the  specific  device  used 
in  this  study  is  provided  by  Moon  and  Jones  (1991).  The 
transducer  was  passed  transnasally  and  positioned  with  the 
light  emitting  fiber  below  the  velopharyngeal  port  and  the 


light  detecting  fiber  placed  above  the  velopharyngeal  port 
Phototrensducer  output  was  then  amplified  for  each  subject 
to  produce  0  volts  during  rest  nasal  breathing  and  a  2  volt 
deflection  during  velopharyngeal  closure  for  (ij.  Because 
the  phototransducer  cannot  provide  absolute  velopharyngeal 
opening  area,  the  range  of  closure  was  denoted  as  0%  (0 
volts)  to  100%  (2  volts). 

Electromyography 

Following  s  light  appliactioo  of  topical  anesthetic 
(4%  Lidocaine),  bipolar  booked  wire  electrodes  were  in¬ 
serted  into  the  levator  veli  palatini,  palatoglossus  and 
palatopharyngeus  muscles  on  each  subject’s  right  side.  The 
electrodes  were  constructed  of  110  m  stainless  steel  wire 
(Medwire  3 16  SS  3T),  and  were  inserted  using  half-inch  30 
gauge  hypodermic  needles.  Levator  veli  palatini  electrodes 
were  inserted  at  the  dimple  of  the  elevated  velum  in  a 
posterior,  lateral,  and  superior  direction,  following  the 
course  of  the  levator  muscle.  Depth  of  insertion  was 
approximately  10mm.  Palatoglossus  and  palatopharyngeus 
electrodes  were  inserted  into  the  midportion  of  the  anterior 
and  posterior  faucial  pillars,  respectively.  Depth  of  inser¬ 
tion  was  approximately  3  mm.  Spacing  between  all  elec¬ 
trode  pairs  was  approximately  5  mm.  Verification  of 
electrode  placement  in  the  levator  veli  palatini  was  made 
during  production  of  [s].  Verification  of  placements  in 
palatoglossus  and  palatopharyngeus  was  made  during  swal¬ 
lowing.  Electrodes  were  repositioned  or  reinserted  if  no 
electromyographic  signal  was  obtained  or  if  the  presence  of 
artifact  was  observed  during  placement  verification  tasks. 

Speech  Tasks 

Each  subject  was  positioned  to  view  a  two  channel 
storage  oscilloscope.  The  target  velopharyngeal  closure 
level  was  displayed  on  one  channel.  The  amplified  and  low 
pass  filtered  (30  Hz)  phototransducer  output  signal  was 
displayed  on  the  second  oscilloscope  channel.  Both  the 
target  and  pbototransducer  signals  were  recorded  on  a 
digital  instrumentation  recorder  (Sony  PC108M). 

Six  speech  conditions  were  employed:  the  vowels 
[a]  and  [i]  each  produced  at  25,  50,  and  75%  closure. 
Following  the  procedures  developed  by  Moon  and  Jones 
(1991),  subjects  viewed  the  pbototransducer  output  on  the 
oscilloscope  and  attempted  to  match  the  target  level  through¬ 
out  the  duration  of  each  10  second  trial.  The  subjects  were 
instructed  to  phonate  the  vowel  normally  for  approximately 
1-2  seconds  and  then  to  open  the  velopbarynx  to  the  target 
level  for  the  remainder  of  the  trial.  A  minimum  of  ten 
attempts  were  recorded  within  each  condition. 

Analyses 

Recorded  electromyographic  and  pbototransducer 
signals  were  digitized  using  a  10  KHz  sampling  rate, 
rectified,  smoothed  with  a  40  ms  time  constant,  and 
downsampledby  a  factor  of  eight.  Within  each  trial,  625 ms 


NCVS  SMus  and  Program  Rsport  •  12 


segments  characterized  by  a  stable  phototransducer  signal 
were  extracted  for  analysis.  The  625  ms  time  window  was 
chosen  to  ensure  that  electromyographic  activity  wm  also 
relatively  stable.  Segments  were  chosen  regardless  of 
whether  or  not  the  target  closure  level  was  attained  That 
is,  the  targets  were  utilized  only  to  elicit  s  range  of  closure 
levels  and  accuracy  of  target  matching  was  not  a  measured 
variable  in  this  study.  The  extracted  segments  were  then 
analyzed  to  determine  relative  velopharyngeal  closure 
level  in  the  0  to  100%  range  and  the  average  corresponding 
activity  level  in  each  of  the  three  velopharyngeal  muscles. 

Separate  analyses  were  performed  for  each  subject 
and  vowel  combination.  The  analyses  included  univariate 
and  multi  variate  regression  analyses  (response  surface 
analyses)  of  muscle  activation  level  as  a  function  of  relative 
closure  level  using  linear,  quadratic,  and  interaction  (in 
multivariate  analyses)  terms.  Prior  to  analysis,  the  electro¬ 
myographic  data  were  normalized  within  each  muscle  for 
each  subject  For  example,  the  activation  levels  for  levator 
veli  palatini  during  production  of  the  vowel  [a]  by  Subject 
A  were  converted  to  percentage  of  maximum  activation 
using  the  maximum  taw  activation  level  of  levator  during 
normal  production  of  that  vowel  by  that  subject  as  the 
reference. 

Results 

Univariate  Analyses 

Subject-specific  coefficients  of  determination  (R* 
expressed  as  a  percentage)  and  mean  square  errors  are 
shown  in  Table  1  for  closure  level  regressed  against  each  of 
the  three  muscles  during  production  of  [a]  and  [i]  by  each 
subject  With  three  exceptions  out  of  24,  R2  values  for  the 
full  model  (linear  plus  quadratic  terms)  are  statistically 
significant  (p  <  0.0001).  However,  the  coefficients  are  low 
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in  many  instances.  In  18  of  the  24  cases  (4  subjects  x  3 
muscles  x  2  vowels)  less  than  50%  of  the  variability  in 
observed  velopharyngeal  closure  level  can  be  explained  by 
level  of  activity  in  any  one  muscle. 

Examples  of  electromyographic  data  associated 
with  the  best  and  worst  coefficient  of  determination  for  the 
levator  veli  palatini  during  production  of  [i]  are  shown  in 
Figure  1.  These  examples  are  from  Subject  B  and  Subject 
D.  It  is  clear  from  the  top  panel  that  a  relationship  between 
levator  activity  and  closure  level  exists.  The  bottom  panel 
does  not  display  a  clearly  identifiable  relationship.  It  is  also 
apparent  from  Figure  1  that  subjects  did  not  demonstrate  the 
same  range  of  closure  levels.  That  is,  while  targets  of  25, 
50  and  75%  closure  were  presented  to  the  subjects,  some 
(e.g.  Subject  D)  had  difficulty  reaching  the  25%  level 
consistently. 
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Figure  1.  Data  tea  resulting  in  bat  (upper.  Subject  B)  and  worn  (lower, 
Subject  D)  coefficient!  of  determination  for  levator  veli  palatini  during  [i  ] 
production. 
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Mutttvwiate  Aaiiyui 

Subject-specific  coefficients  of  determination  (R2 
expressed  as  a  percentage)  for  the  full  multivariate  model 
are  presented  in  Table  2.  Id  addition,  the  contributions  to 
the  coefficients  of  determination  corresponding  to  the 
partial  F-tests  for  the  set  of  linear  effects,  the  setof quadratic 
effects,  and  the  setof  two-way  interaction  effects  are  listed. 
Coefficients  of  determination  for  the  full  model  are  signifi¬ 
cant  (p  < 0.0001)  for  each  subject  The  contributions  of  the 
linear  terms  are  significant  in  all  eight  cases  (2  vowels  X  4 
subjects).  The  contributions  of  the  quadratic  terms  to  the 
model  are  significant  in  seven  of  the  eight  cases.  The 
interaction  term  is  significant  in  four  of  the  eight  cases. 
Since  each  set  of  terms  is  significant  in  at  least  half  of  the 
cases,  they  are  all  retained  in  the  final  model  for  all  subjects 
regardless  of  their  statistical  significance.  However,  it  is 
clear  from  Table  2  that  the  linear  term  contributes  most  to 
the  overall  coefficient  of  determination  for  each  subject 
within  each  voweL  The  exception  is  the  quadratic  term  for 
Subject  B  producing  [a].  On  average,  the  quadratic  and 
two-way  interaction  terms  account  for  only  2.6%  of  the 
variability  in  closure  level. 


For  each  subject,  the  percentage  of  previously 
unexplained  variability  (from  best  muscle  in  univariate 
analysis)  captured  by  the  multivariate  analysis  was  as¬ 
sessed.  For  [a],  these  values  are  36.5%,  35.6%,  73.6%,  and 
29.34%  forsubjects  A  to  ^respectively.  For  [i],  the  values 
are  66.1%,  12.9%,  34.4%,  and  46.0%  for  subjects  A  to  D, 
respectively.  Similarly,  reductions  in  mean  square  error 
can  be  observed  by  comparing  the  multivariate  model  mean 
square  errors  in  Table  2  with  the  univariate  results  in  Table 
1.  It  is  evident  that  the  multivariate  mean  square  error 
values  for  any  subject-vowel  combination  are  consistently 
lower  than  all  of  the  univariate  mean  square  errors  for  that 
subject  and  vowel. 


Using  inverse  distance  interpolation,  three  dimen¬ 
sional  mesh  plots  were  generated  from  relative  closure  level 
and  electromyographic  muscle  activity  data.  Figure  2 
shows  a  three  dimensional  mesh  plot  (left)  and  the  indi¬ 
vidual  data  points  (right)  used  to  derive  the  mesh  plot 
Figures  3, 4, 5  and  6  depict  three  dimensional  mesh  plots 
from  each  subject  for  the  two  vowels.  Because  the  interac¬ 
tions  among  all  three  muscles  and  closure  level  require  four 
dimensions  and  thus  cannot  be  displayed  on  a  three  dimen¬ 
sional  graph,  each  Figure  contains  the  three  combinations 
of  muscle  pairs. 


Figure  2.  Three  dimensional  mat  plot  of  levator  veil  palatini  versus 
palatoglossus  for  Subject  B  producing  [i]  (left)  and  oukvidual  data  points 
(right)  used  to  derive  mask  plat. 


Figures  3  through  6  clearly  illustrate  both  similari¬ 
ties  and  differences  in  the  nature  of  the  influence  of  each 
muscle  relative  to  the  other  muscles  as  a  function  of  subject 
and  of  the  vowel  being  produced.  For  example,  at  the  top 
left  of  Figure  3,  essentially  unchanging  levels  of  activity  in 
the  levator  muscle  are  associated  with  decreasing  levels  of 
palatoglossus  activity  as  closure  level  increases  for  produc¬ 
tion  of  [a]  by  Subject  A.  The  bottom  graph  shows  decreases 
in  both  palatoglossusandpalatopharyngeus  with  increasing 
closure  level. 

In  Figure  4,  a  different  activation  pattern  is  ob¬ 
served  for  [o]  (left  side)  compared  to  the  other  subjects. 
Here,  levator,  palatoglossus  and  palatopharyngeus  muscle 
activity  all  appear  to  increase  as  closure  level  increases. 
This  subject  (Subject  B)  also  displays  more  similarity  in 
activatioo  patterns  between  the  vowels  [o]  and  [i]  than  the 
other  three  subjects.  Like  Subject  A,  activation  patterns  for 
Subjects  C  (Figure  3)  and  D  (Figure  6)  also  differ  as  a 
function  of  vowel  produced. 

A  more  systematic  interpretation  of  muscle  inter¬ 
actions  displayed  graphically  in  Figures  3  through  6  was 
accomplished  using  the  signs  of  each  of  the  parameter 
estimates  (linear,  quadratic,  and  two-way  interaction)  of  the 
multivariate  model  generated  for  each  subject  and  each 
vowel.  Linear  terms  include  LEV  (levator),  PG 
(palatoglossus),  and  PP  (palatopharyngeus).  Quadratic 
terms  include  LEV2,  PG2,  and  PP2.  Two-way  interaction 
terms  include  LEV*PG,  LEV*PP,  and  PG*PP.  Regardless 
of  subject  or  vowel,  parameter  LEV  is  always  positively 
related  to  velopharyngeal  closure  level.  Of  additional 
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Figure  3 -Ufl.  Three  dimtmuitmal  meek  plate  for  Subject  A  producing  the  vowels  [e/ teed  {(].  Shaded  areas  mdicaumimmumtmdmaxmwut  recorded  EMG 
levels  for  each  muscle.  Interpolated  meek  plots  extend  to  EMC  minima  and  mariUM  throughout  0  -  100%  max  EMG  range  for  each  muscle.  Figure  4  - 
right  Thru*  dimeneional  mesh  plots  for  Subject  B  producing  the  vowels  [aj  and  [if  Shaded  areas  indicate  minimum  teed  maximum  recorded  EMC  level $ 
for  each  muscle.  Interpolated  metk  plot*  extend  to  EMG  minima  and  maxima  throughout  0  - 100%  max  EMG  range  for  each  muscle. 


interest  is  the  observation  that  the  LEV2  parameter  in  the 
multivariate  model  is  negatively  signed  during  [i]  for  three 
of  the  four  subjects  and  during  [a]  by  two  of  the  four 
subjects.  The  negative  sign  is  indicative  of  a  curvilinear 
pattern  of  increasing  levator  activity  with  increasing  clo¬ 
sure  level.  This  pattern  is  evident  in  the  upper  graphs  of 
Figure  4. 

Thecontributions  made  by  the  palatoglossus  muscle 
to  the  multivariate  model  appear  to  be  vowel  specific.  In 
addition,  its  contribution  is  more  variable  than  that  of  the 
levator  muscle.  For  the  vowel  [i],  the  PG  parameter  is 
always  negatively  signed.  The  negative  sign  suggests  that 
palatoglossus  activity  decreases  with  increasing  closure 
level.  The  pattern  is  less  obvious  for  [a].  Two  subjects 
display  a  positive  relationship  between  palatoglossus  activ¬ 
ity  and  closure  level  (as  indicated  by  a  positively  signed  PG 
model  parameter  estimate),  while  the  other  two  display  a 
negative  relationship. 

The  PG3  parameter  appears  to  be  more  variable 
than  LEV2.  For  [a],  it  is  negatively  signed  for  two  subjects 
and  positively  signed  for  one  subject  For  [i],  it  is  positively 


signed  for  two  subjects  and  negatively  signed  for  ooe.  The 
negative  sign  is  indicative  of  a  curvilinear  pattern  of 
increasing  palatoglossus  activity  with  increasing  closure 
level.  The  positive  sign  is  indicative  of  a  curvilinear 
decrease  in  palatoglossus  activity  with  increasing  closure 
level. 

The  contribution  of  palatopharyngeus  muscle  is 
also  important  to  the  multivariate  model.  For  [i],  the 
palatopharyngeus  parameter  (PP)  is  negatively  related  to 
closure  level  in  all  cases.  For  [a],  its  relationship  to  closure 
level  varies  across  subjects.  The  PF2  parameter  is  negative 
for  two  subjects  during  [o]  and  positive  for  one  subject 

Discussion 

The  results  of  this  study  indicate  that 
velopharyngeal  positioning  in  space  is  determined  by  the 
relative  contributions  of  the  levator,  palatoglossus,  and 
palatopharyngeus  muscles.  This  is  evidenced  by  the  dra¬ 
matic  increase  in  closure  level  variability  explained  using 
the  multivariate  model  over  the  univariate  model.  These 
data  support  the  notions  of  earlier  investigators  (Shelton, 
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Figure  5  -  left.  Three  dimensional  mesh  plots for  Subject  C producing  the  vowels  (el  mi  [ ij.  Shaded  areas  indicate  minimum  and  maximum  recorded  EMC 
levels  for  each  muscle.  Interpolated  mesh  plots  extend  to  EMG  minima  and  maxima  throughout  0-  10Q%  mas  EMG  range for  each  muscle.  Figure  6-  right. 
Three  dimensional  mesh  plots  for  Subject  D  producing  the  vowels  [alandfi].  Shaded  areas  indicate  minimum  and  maximum  recorded  EMG  levels  for  each 


Harris,  S  holes,  and  Dooley,  1970,  Seaver  and  Kuetan,  1980, 
Kudin  et  al„  1982)  that  activity  levels  in  more  than  one 
velar  muscle  must  be  taken  into  account.  The  results  of  the 
present  study  cannot,  however,  be  taken  as  a  lack  of  support 
for  early  characterizations  (Fritze 11, 1969;BeIl-Berti,  1976) 
of  the  levator  muscle  as  the  primary  muscle  involved  in 
velar  elevation.  Based  on  anatomical  position  alone, 
velopharyngeal  elevation  must  be  greatly  influenced  by 
contraction  of  the  levator  veli  palatini.  The  results  of  the 
present  investigation  do  suggest,  however,  that  the  levator 
muscle  performs  as  one  element  of  a  coordinated  structure 
of  at  least  three  velopharyngeal  muscles.  Within  this 
framework,  position  control  of  the  velopharyngeal  mecha¬ 
nism  is  flexible  in  that  it  allows  for  varying  combinations 
of  muscle  activation  among  the  constituent  muscles. 

It  is  evident  from  Figures  3,  4,  5  and  6  and  the 
multivariate  analysis  that  muscle  interactions  vary  both 
across  subjects  and  vowels.  The  amount  of  variability  in 
each  muscle’ s  contribution  to  velar  position  is  also  evident 
from  Table  1,  where  univariate  coefficients  of  determina¬ 
tion  range  from  less  than  1%  to  87%.  However,  the 
multivariate  analyses  substantiate  the  importance  of  a 


coordinative  structure  conceptualization  of  velar  muscle 
activity  for  positioning  of  the  velopharyngeal  mechanism. 
The  parameter  estimates  of  the  multivariate  model  reveal 
similarities  and  differences  across  both  subjects  and  vowds 
that  may  provide  insights  into  the  nature  of  the  coordinative 
structure  framework. 

Regardless  of  subject  or  vowel,  the  linear  param¬ 
eter  estimate  for  the  levator  muscle  (LEV)  was  always 
positivdy  related  to  velopharyngeal  closure  level.  This 
should  not  be  surprising  given  previously  published  data 
regarding  the  role  of  levator  and  given  our  knowledge  of  its 
anatomical  position.  That  is,  its  anatomical  position  in 
normal  speakers  is  conducive  to  velar  elevation  upon 
contraction.  However,  earlier  investigators  (Fritzell,  1969; 
Kuehn,  FoUrins,  and  Cutting,  1982)  suggested  that  velar 
position  and  magnitude  of  levator  muscle  activity  do  not 
appear  to  be  directly  related.  This  conclusion  is  supported 
by  the  univariate  and  multivariate  analyses  conducted  in 
this  study. 

Our  results  show  that  the  contributions  matte  by 
the  palatoglossus  muscle  to  the  multivariate  model  appear 
to  be  vowel  specific.  In  addition,  its  contribution  is  more 
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variable  than  that  of  the  levator  muscle.  During  [i], 
palatoglossus  activity  was  negatively  related  to  closure 
level  for  all  four  speakers.  For  [a]  the  relationship  was 
mixed;  positive  for  two  speakers  and  negative  for  two 
speakers.  Tbese  observations  raise  questions  regarding  the 
effects  of  competing  activity  (i.e.,  tongue  elevation  and 
pharyngeal  wall  positioning)  on  the  nature  of  coordinative 
structure  interactions  within  the  soft  palate.  Production  of 
[i]  requires  elevation  of  the  tongue  whereas  production  of 
[a]  does  not  Contraction  erf  the  palatoglossus  muscle  to 
produce  upward  movement  of  the  back  of  the  tongue  for  [i] 
would  be  expected  to  produce  a  downward  pull  on  the  soft 
palate.  One  might  speculate  that  less  palatoglossus  activity 
occurs  at  higher  velopharyngeal  closure  levels  because  a) 
the  elevated  tongue  position  can  be  maintained  at  least  in 
part  by  mechanical  linkage  forces,  and/or  b)  increased 
palatoglossus  activity  in  association  with  increased  levator 
activity  mightmove  the  back  of  the  tongue  too  high.  During 
[a]  however,  the  consequences  of  increased  palatoglossus 
muscle  activity  and  its  interaction  with  levator  muscle 
activity  are  not  as  important.  That  is,  more  variability  in 
tongue  back  positioning  might  be  tolerated  during  [a]. 
There  may  be  support  for  this  notion  in  Table  1,  where 
univariate  coefficients  for  palatoglossus  are  substantially 
lower  for  [i]  for  three  of  the  four  subjects.  Further,  three  of 
the  four  subjects  revealed  partial  F  statistic  values  from  the 
multivariate  analysis  for  the  palatoglossus  muscle  that  were 
much  lower  for  [i]  than  for  [a].  In  other  words,  palatoglossus 
appears  to  divide  its  role  during  [i],  and  may  be  less  of  a 
factor  far  this  vowel. 

If  the  palatoglossus  muscle  has  little  influence  on 
velar  positioning  during  [i],  one  must  ask  whether  the  other 
muscles  have  more  influence  relative  to  their  performance 
during  [a].  The  results  of  both  the  univariate  and  multiva¬ 
riate  analyses  do  not  provide  convincing  evidence  of  this. 
The  univariate  analysis  revealed  a  greater  influence  of 
levator  during  [i]  for  only  one  of  the  four  speakers.  The 
partial  F  statistic  values  from  the  multivariate  analysis  were 
greater  for  levator  during  [i]  for  only  two  of  the  four 
speakers.  However,  the  interaction  between  levator  and 
palatoglossus  muscle  did  change  as  a  function  of  the  vowel 
produced.  For  [i],  the  LEV*PG  interaction  term  was  always 
positively  related  to  closure  level.  During  [a]  the  relation¬ 
ship  was  again  inconsistent  One  might  interpret  the 
positive  LEV*PG  term  to  indicate  that  the  levator  and 
palatoglossus  muscles  were  not  working  independently 
during  this  task.  That  is,  for  a  given  level  of  closure,  the 
interaction  suggests  that  greater  levels  of  activity  were 
observed  in  the  tbese  muscles  than  would  be  expected  had 
the  muscles  been  working  separately.  Finally,  the  multiva¬ 
riate  coefficients  of  determination  during  [i]  tended  to  be 
lower  (with one  exception)  than  those  obtained  for  [a].  That 
is,  less  of  the  variability  in  closure  level  could  be  explained 
by  activation  levels  of  the  three  muscles  and  their  interac¬ 
tions  during  [i]  than  during  [a].  This  may  have  been  due  to 


the  effects  of  competing  activity  associated  with  tongue 
elevation,  or  that  we  have  not  sampled  all  of  the  activity  in 
the  three  muscles.  It  might  also  indicate  that  activity  levels 
in  other  muscles  (e.g.  musculus  uvulae,  superior  constric¬ 
tor,  and  perhaps  the  more  transverse  fibers  of 
palatopharyngeus)  play  a  role  in  velar  positioning  that 
becomes  more  important  during  certain  vowels.  Of  course, 
we  can  only  speculate  on  the  possible  influence  of  these 
additional  muscles.  However,  FritzelK 1969)  alluded  to  the 
possible  influence  of  superior  constrictor  activity  on 
velopharyngeal  closure  and  the  consequence  of  increased 
superior  constrictor  activity  on  the  relative  influence  of  the 
levator  muscle  in  some  subjects.  While  Kuehn,  Folkins, 
and  Cutting  (1982)  observed  that  superior  constrictor  was 
active  for  all  speech  sounds  studied,  they  were  unsure 
whether  the  magnitudes  of  activity  seen  were  “sufficient  to 
contribute  in  a  substantial  way  to  the  interaction  of  forces 
for  velar  movement”  (p.  34).  Regarding  musculus  uvulae, 
Kuehn,  Folkins  and  Linville  (1988)  proposed  an  extensor 
role.  That  is,  contraction  of  the  musculus  uvulae  was 
proposed  to  exert  a  compressional  force  along  the  top  side 
of  the  velum  which  would  tend  to  straighten  the  curved 
velum.  Such  a  straightening  gesture  could  be  used  to 
modify  velar  position  in  a  manner  unaffected  by  changes  in 
activity  levels  of  the  levator  and  palatoglossus  musculature. 
However,  Kuehn  et  al.,  (1988)  found  musculus  uvulae 
activity  to  be  highly  correlated  with  levator  veli  palatini. 

Finally,  the  contribution  of  palatopharyngeus 
muscle  to  the  multivariate  model  may  also  be  important  in 
the  positioning  of  the  velopharyngeal  mechanism. 
Palatopharyngeus  activity  during  [i]  was  always  negatively 
related  to  closure  level.  It  is  of  interest  that  the  nature  of 
palatopharyngeus  involvement  as  a  function  of  vowel  tends 
to  parallel  that  of  palatoglossus  even  though 
palatopharyngeus  would  not  be  expected  to  be  involved  in 
tongue  elevation.  It  is  also  of  interest  that,  during  the 
production  of  both  [i]  and  [a],  the  LEV*PP  interaction  term 
in  the  multivariate  model  was  always  negative.  In  contrast 
to  the  positive  LEV*PG  interaction,  this  interaction  may  be 
interpreted  to  indicate  independent  influences  of  tbese  two 
muscles  on  closure  level. 

The  results  of  this  study  are  in  partial  agreement 
with  earlier  characterisations  of  the  role  of  palatopharyngeus 
during  speech  production.  Fritzell  (1979)  suggested  that 
palatopharyngeus  activity  was  associated  with  narrowing 
of  the  pharynx  during  [o] .  As  was  the  case  with  palatoglossus 
activity  during  tongue  elevation  for  [i],  one  might  charac¬ 
terize  narrowing  of  the  pharynx  for  [a]  as  a  competing 
activity  that  would  affect  the  interrelationships  among 
these  muscles  during  the  production  of  [o].  However,  we 
observed  that  palatopharyngeus  was  always  negatively 
related  to  closure  level  during  [i],  but  its  relationship  to 
closure  during  [a]  was  variable.  It  is  difficult  to  explain  this 
observation  using  the  same  arguments  presented  in  the  case 
of  the  palatoglossus  muscle.  However,  it  could  be  argued 
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that,  for  these  four  subjects,  palatopharyngeus  plays  less  of 
a  role  hi  velopharyngeal  positioning  than  do  levator  or 
palatoglossus. 

To  summarize,  control  of  position  of  the 
velopharyngeal  mechanism  can  be  explained  using  a 
coordinative  structure  notion  of  muscle  interaction.  Activ¬ 
ity  levels  of  individual  muscles  atone  do  not  account  for 
velar  position  as  well  as  the  combined  interactive  activation 
levels  of  the  levator,  palatoglossus,  and  palatopharyngeus. 
Multivariate  models  depicting  the  role  of  each  muscle  and 
their  interactions  in  velopharyngeal  positioning  during  the 
tasks  studied  here  allow  for  a  description  of  such  interac¬ 
tions.  Interactions  among  the  levator  veli  palatini, 
palatoglossus,  and  palatopharyngeus  and  velopharyngeal 
closure  level  were  observed  in  the  data  that  describe  the 
complexity  of  control  of  the  velopharyngeal  mechanism 
that  was  once  assigned  exclusively  to  the  levator  veli 
palatini  muscle.  It  is  clear  from  our  data  that  there  may  be 
some  consistent  relationships  between  velopharyngeal 
muscle  activity  and  their  interactions  and  control  of  the 
velopharyngeal  mechanism.  Some  of  these  relations  may 
be  associated  with  anatomic  positioning  (i.e.  levator  veli 
palatini  muscle  always  positively  related  to  velar  elevation) 
while  others  may  be  related  to  the  relationship  between 
velopharyngeal  movement  and  movement  of  other  speech 
articulators  (i.e.  palatoglossus  always  negatively  related  to 
velopharyngeal  positioning  during  [i]).  It  is  also  clear  from 
our  data  that  some  variability  in  muscle  activity  and  their 
interrelations  also  exist.  Between  speaker  variability  may 
be  due  to  variations  in  anatomical  positioning  of  the  muscles 
(Kuefan  and  Azzam,  1978)  affecting  relative  force  levels  or 
simply  to  motor  system  variability  common  to  all  con¬ 
trolled  systems.  Additional  investigations  are  underway  to 
further  delineate  relationships  among  velopharyngeal 
muscles  during  speech  production. 
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Abstract 

Weakness  and  fatigue  of  the  speech  production 
system  may  contribute  to  articulatory  imprecision  and 
timing  difficulties  in  the  speech  of  people  with  Parkinson’ s 
disease.  Nineteen  individuals  with Paririnson’sdisease  and 
19  healthy  matched  control  subjects  were  tested  for  strength 
and  endurance  of  the  tongue.  Tongue  function  was  evalu¬ 
ated  by  the  Iowa  Oral  Performance  Instrument,  a  pressure 
sensing  device.  In  addition,  speech  was  evaluated  for 
articulatory  imprecision,  overall  speech  defectiveness,  and 
speech  rate.  Subjects  with  Parkinson’s  disease  were  found 
to  have  lower  tongue  strength  but  comparable  tongue 
endurance  when  compared  to  matched  control  subjects. 
The  contributions  of  peripheral  and  central  processes  of 
fatigue  are  discussed  in  light  of  the  present  findings.  The 
subject  groups  did  not  differ  significantly  foroverall  speech 
defectiveness  or  interpause  speech  rate,  but  the  speech 
articulation  of  the  subjects  with  Parkinson’s  disease  was 
perceived  as  less  precise  than  that  of  the  control  subjects. 
Because  most  of  the  subjects  in  this  investigation  had 
perceptibly  normal  or  mildly  disordered  speech,  a  potential 
relationship  between  tongue  function  and  speech  profi¬ 
ciency  could  not  be  examined  adequately. 

A  common  characteristic  of  the  speech  disorderof 
Parkinson’s  disease,  hypokinetic  dysarthria,  is  imprecision 
of  articulation  (Canter,  1965;  Cbenery,  Murdoch,  &  Ingram, 
1988;  Darley,  Aronson,  &  Brown,  1969;  Ewanowski,  1964; 


Laszewski,  1956;  Logemann,  Fisher,  Boshes,  &  Blonsky, 
1978;  Morrison,  Rigrodsky,  Sc  Mysak,  1970;  Solomoa  & 
Hixon,  1993;  Tanner,  1976).  Logemann,  Boshes,  and 
Fisher  (1972)  reported  that  articulatory  errors  were  due  to 
inadequate  constriction  or  occlusion  of  the  upper  airway. 
Their  analysis  revealed  a  “sequence  of  articulatory  de¬ 
generation”  from  posterior  to  anterior  placements  of  articu¬ 
lation  with  disease  progression. 

Another  aspect  of  speech  that  can  be  abnormal  in 
this  population  is  speech  rate.  Speech  rate  may  be  too  fast 
(Hammen,  1990;  Hammen,  Yorkston,  Sc  Beukelman,  1989; 
Hanson  Sc  Metter,  1983;  Tanner,  1976;  Yorkston,  Hammen, 
Beukelman,  &  Traynor,  1990),  too  slow  (Anthony  & 
Farquharson,  1975;  Boshes.  1966;  Kammermeier,  1969; 
Feacber,  1950),  variable  (Critchley,  1981;  Darley  et  al., 
1969;  Ludlow  Sc  Bassich,  1983;  Metter  &  Hanson,  1986), 
accelerating  (Critchley,  1981;  Hirose  etal„  1981;  Streifler 
&  Hofrnan,  1984),  or  normal  (Alp,  1988;  Pitcairn,  Clemie, 
Gray,  &  Pentland,  1990).  Individual  differences  between 
speakers  appear  to  be  the  rule  (Canter,  1963). 

The  clinical  characteristics  of  rigidity  and 
bradykinesia  have  been  hypothesized  to  account  for  the 
predominant  articulatory  and  temporal  abnormalities  in 
Parkinson’s  disease  (Hunker,  Abbs,  &  Barlow,  1982). 
Increased  background  muscle  activity  in  lip  muscles  has 
been  reported  as  evidence  for  rigidity  (Hunker  et  al.,  1982; 
Leanderson,  Meyerson,  &  Persson,  1972;  Marquardt,  1973). 
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Rigidity  often  is  assumed  to  causehypokinesia  in  Parkinson’s 
disease,  but  this  concept  has  been  challenged  (Caliguiri, 
1987).  The  presence  ofbradyltinesia  in  the  orofacial  system 
has  not  been  established  definitively.  Although  selective 
reductions  in  Up  and/or  jaw  displacement  and  velocity  have 
been  reported  in  people  with  Parkinson’s  disease  during 
speech  (CaUgiuri,  1987;  Conner,  Abbs,  Cole,  &  Graeco, 
1989;  Forrest,  Weismer,  &  Turner,  1989;  Hunker  et  al., 
1982),  the  relationship  between  displacement  amplitude 
and  velocity  has  been  demonstrated  to  be  normal  (Forrest 
et  al.,  1989).  Although  the  mechanisms  for  movement 
difficulties  in  the  orofacial  system  are  unknown,  the  tongue 
appears  to  be  affected  for  speech.  Acoustic  data  indicate 
that  the  center  frequencies  of  vowel  formants  may  be 
abnormal  (Tanner,  1976)  or  the  extent  and  speed  of  formant 
transitions  are  reduced  for  speakers  with  Parkinson’s  dis¬ 
ease  (Conner,  Ludlow,  &  Schulz,  1989;  Forrest  et  al., 
1989). 

In  addition  to  theclassic  motor  signsofParkinson’s 
disease,  weakness  and  fatigue  of  the  speech  production 
system  may  contribute  to  articulatory  imprecision  and 
timing  difficulties.  A  small  number  of  studies  have  found 
reduced  tongue  strength  or  endurance  to  be  related  to 
speech  disorders  in  populations  other  than  Parkinson’s 
disease.  Children  and  adults  with  a  variety  of  articulation 
and  fluency  disorders  (Palmer  &  Osborn,  1940)  and  adults 
with  amyotrophic  lateral  sclerosis  (Dworkin,  1978;  Dworkin, 
Aronson.  &  Mulder,  1980)  have  demonstrated  lower  than 
normal  tongue  strength.  Children  with  developmental 
apraxia  of  speech  were  found  to  have  normal  tongue 
strength  but  reduced  tongue  endurance  (Robin,  Somodi,  & 
Luschei,  1991). 

Weakness  and  fatigue  have  been  recognized  as 
common  symptoms  of  Parkinson’s  disease  as  early  as  its 
original  description  by  James  Parkinson  in  1817.  However, 
the  few  objective  studies  of  muscle  strength  and  endurance 
in  Parkinson’s  disease  have  provided  equivocal  results. 
Wilson  (192S)  provided  examples  of  reduced  strength  in 
various  muscles  of  a  few  people  with  parkinsonism,  but 
indicated  that  the  more  pervasive  problem  is  slowness  of 
muscle  contraction  and  relaxation,  and  the  inability  to 
maintain  contractions.  Schwab,  England,  and  Peterson 
( 1 959)  argued  that  weakness  is  not  a  problem  in  Parkinson’ s 
disease,  because  normal  amplitudes  and  directions  of  finger 
movements  were  achieved  voluntarily  (“voluntary 
ergogram”  from  the  first  dorsal  interosseous  muscle)  with 
adequate  motivation,  and  normal  movement  was  elicited 
from  electrical  stimulation  of  the  same  muscle  (“electronic 
ergogram”).  Again,  they  noted  that  endurance  was  a 
primary  problem. 

Saltin  and  Landis  (197S)  reported  that  maximal 
isometric  strength  of  the  ankle  and  knee  flexors  was  similar 
for  6  subjects  with  moderate  to  severe  Parkinson’s  disease 
and  healthy  control  subjects.  Roller  and  Kase  (1986)  also 
found  no  difference  for  isometric  hand  grip  strength  (maxi¬ 


mal  effort  using  a  dynamometer,  averaged  over  2  trials) 
between  21  subjects  with  mild  Parkinson’s  disease  and 
normal  subjects.  However,  the  subjects  with  Parkinson’s 
disease  demonstrated  significantly  decreased  maximum 
isotonic  muscle  strength  of  the  wrist,  arm,  and  knee, 
measured  by  averaging  the  second,  third,  and  fourth  repeti¬ 
tions  of  maximal  extension/flexion  movements.  The  au¬ 
thors  concluded  that  the  subjects  with  Parkinson’s  disease 
were  weaker  than  the  control  subjects,  but  only  for  repeti¬ 
tive  tasks.  Similarly,  Tzelepis,  McCool,  Friedman,  and 
Hoppin  (1988)  found  that  9  subjects  with  mild  to  moderate 
Parkinson’ s  disease  were  not  impaired  for  single  maximum 
efforts  but  were  impaired  for  repetitive  efforts  involving  the 
respiratory  system. 

Contrary  to  these  findings  of  normal  isometric 
strength,  Yanagawa,  Shindo,  and  Yanagisawa  (1990)  re¬ 
ported  decreased  maximal  strength,  measured  as  maximal 
torque  produced  by  voluntary  ankle  dorsiflexion,  in  IS 
subjects  with  mild  to  moderate  Parkinson’s  disease.  How¬ 
ever,  normal  torques  were  obtained  when  the  common 
peroneal  nerve  was  electrically  stimulated.  Because  weak¬ 
ness  was  apparent  with  voluntary  activation  but  strength 
was  normal  when  the  muscle  was  activated  involuntarily,  a 
central  rather  than  peripheral  mechanism  for  muscle  weak¬ 
ness  was  indicated.  These  results  indicate  that  muscle  or 
joint  stiffness  did  not  contribute  to  demonstrated  weakness. 

We  are  aware  ofonly  one  study  that  systematically 
examined  endurance  in  Parkinson’s  disease.  Roller  and 
Rase(1986)  defined  endurance  as  the  number  of  repetitions 
of  maximum  extension/flexion  movements  of  the  wrist, 
arm,  and  knee  to  fatigue  or  until  only  50%  of  the  maximum 
strength  could  be  generated.  They  found  that  endurance 
was  greater  for  subjects  with  Parkinson’s  disease  than  for 
control  subjects.  This  measure  of  endurance  is  difficult  to 
interpret  because  the  level  of  force  (strength)  and  the  rate  of 
repetitions  (movement  velocity)  can  differ  between  sub¬ 
jects.  Case  study  reports  have  clearly  indicated  a  progres¬ 
sive  decline  in  muscle  strength  over  time  that  is  quite 
different  than  that  seen  in  healthy  subjects  (Schwab  et  al., 
1959;  Wilson,  1925). 

Examination  of  strength  in  the  orofacial  system  of 
people  with  Parkinson’s  disease  has  been  reported  in  a  few 
case  studies.  Barlow  and  Abbs  (1983)  described  a  subject 
with  PD  who  did  not  demonstrate  tongue  weakness  but 
exhibited  instability  in  maintaining  a  steady  force  during 
tongue  elevation.  Dworkin  and  Aronson  (1986)  reported 
lower  than  normal  maximum  tongue  “strength,”  measured 
by  calculating  the  area  under  a  force  curve,  in  one  subject 
with  Parkinson’s  disease.  Lip  weakness  has  also  been 
reported  in  Parkinson’ s  disease  (Netsell,  Daniel,  &  Celesia, 
1975;  Wood,  Hughes,  Hayes,  &  Wolfe,  1992).  Netsell  etal. 
(1975)  studied  muscle  activity  of  the  upper  lip  in  22  people 
with  Parkinson’s  disease  (some  of  whom  bad  been  treated 
with  thalamic  surgery)  and  reported  evidence  of  weakness 
(reduced  amplitude  and  duration  of  electromyographic 
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activity)  in  at  least  one  representative  subject  Woodetal. 
(1992)  used  a  labial  force  transducer  to  assess  maximum 
force  generation  and  found  weakness  of  the  lower  lip,  but 
not  the  upper  lip.  in  10  subjects  with  Parkinson’s  disease,  8 
of  whom  had  dysarthria.  We  are  unaware  of  studies  that 
examined  endurance  in  the  orofacial  system  in  people  with 
Parkinson's  disease. 

In  the  present  investigation,  we  tested  strength  and 
endurance  of  the  tongue  and  hand  in  19  people  with  mild  to 
moderate  idiopathic  Parkinson’s  disease  and  19  control 
subjects  matched  for  physical  characteristics.  Assessing 
hand  function  was  deemed  informative  as  an  indicator  of 
general  muscle  functioning.  Individuals  with  Parkinson’s 
disease  may  demonstrate  differential  impairment  of  the 
extremities  and  midline  structures.  The  extant  literature  in 
Parkinson’s  disease  has  not  addressed  specifically  orofacial 
strength  and  endurance  in  relation  to  speech.  To  address 
possible  relations  between  general  tongue  function  and 
speech  production,  speech  samples  from  the  subjects  were 
evaluated  by  experienced  speech-language  pathologists  for 
severity  of  articulatory  imprecision  and  overall  speech 
defectiveness.  In  addition,  speech  rate  was  calculated  from 
an  acoustic  record. 

Method 

Subjects 

Subjects  were  19  adults  diagnosed  with  idiopathic 
Parkinson’s  disease  recruited  from  the  Movement  Disor¬ 
ders  Clinic  at  the  University  of  Iowa  Hospitals  and  Clin¬ 
ics.1-*  Individual  data  pertaining  to  physical  characteristics 
and  disease  severity  for  the  subjects  with  Parkinson’s 
disease  are  provided  in  Table  1.  The  subjects  were  in  mild 
to  moderate  stages  of  Parkinson’s  disease  as  judged  on  a 
modified  Hoefan  and  Yahr  scale  (1967;  FahnetaL,  1987)  by 
a  neurologist  on  the  same  day  as  data  collection.  Mild 
disease  (Stages  1  or  2)  was  present  in  12  subjects,  mild-to 
moderate  (Stage  2.5)  in  5,  and  moderate  (Stage  3)  in  2.  The 
subjects  had  no  neurologic  or  speech  disorders  other  than 
those  associated  with  Parkinson’s  disease.  Sixteen  subjects 
with  Parkinson’s  disease  were  taking  antiparkinsonism 
medications  although  none  experienced  clinical  fluctua¬ 
tions  in  their  motor  signs.  Unfortunately,  we  were  not  able 
to  coordinate  data  collection  with  the  drug  cycle  because  of 
scheduling  constraints. 

Nineteen  neuro logically  normal  adults  were  re¬ 
cruited  from  the  community  to  match,  one-to-one,  subjects 
with  Parkinson’s  disease  for  sex  and  age  (within  3  years). 

'  In  our  original  prsaontation  of  data  (loraii,  Solomon,  Robin, 
Somodi,  luichai,  &  RodnSzky,  1992),  23  subjects  were  included.  For  this  final 
enslysfe,  2  subjects  were  eliminated  because  Englsh  wee  not  their  fret  language, 
1  for  hairing  a  history  of  drag  abuse,  and  1  for  being  in  a  severe  stage  of  the  disease 
(Hoehn  &  Yahr,  Stage  4). 

*Two  of  these  subjects  were  described  previously  in  a  preliminary 
report  [Subjects  P-L  ("Mrs.  S.")  and  P-Q  fWrs.  H.“);  Solomon  at  at,  1993). 
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In  addition,  subjects  were  matched  as  closely  as  possible  for 
weight  (ail  were  within  5  kg  with  the  exception  of  subject 
pair  O)  and  height  (within  10  cm  except  for  subject  pairs  A 
and  R).  These  variables  have  been  found  to  correlate  with 
strength  in  various  skeletal  muscles  (B  urke,  Tuttle,  Thomp¬ 
son,  Janey,  &  Weber,  1953;  Collumbine,  Bibile, 
Wikramanayake,  &  Watson,  1950;  Larsson  &  Karisson, 
1978;  Petrofsky  &  Lind,  1975;  Robin,  Somodi,  &  Luschei, 
1991).  Control  subjects  had  negative  histories  for  neuro¬ 
logic,  speech,  or  language  disorders,  and  were  not  raking 
medications  that  would  affect  motor  performance.  All 
subjects  spoke  General  American  English  as  their  native 
language. 

Procedures 

The  Iowa  Oral  Performance  Instrument  (IOPI) 
was  used  to  assess  strength  and  endurance.  The  IOPI  has 
been  described  in  detail  previously  (Robin  et  al„  1991; 
Robin,  Goei,  Somodi,  &  Luscbei,  1992).  In  brief,  the  IOPI 
measures  pressure  exerted  upon  a  small,  air- filled  bulb,  and 
displays  the  result  digitally  (in  kPa)  or  by  amulti-light  LED 
display.  For  the  tongue,  a  small  plastic  bulb  is  placed 
against  the  hard  palate  immediately  posterior  to  the  alveolar 
ridge,  and  the  subject  pushes  against  the  bulb  in  a  rostral 
direction  with  the  anterior  portion  of  the  tongue  dorsum. 


NCVS  Status  and  Program  Raport  •  21 


For  the  hand,  the  subject  grips  a  band  bulb  which  is  placed 
in  (he  palm  of  the  preferred  hand. 

To  measure  maximum  strength,  subjects  squeezed 
the  bulb  as  hard  as  possible.  The  best  of  two  trials  was  taken 
as  the  maximum.  Hand  strength  was  determined  first,  then 
tongue  strength.  Following  the  maximum  strength  maneu¬ 
vers,  endurance  of  the  hand  and  then  the  tongue  (one  trial 
each)  was  measured.  Subjects  were  instructed  to  maintain 
50%  of  the  maximum  pressure  as  long  as  possible.  The  LED 
display  on  the  IOP1  was  used  for  visual  feedback  and  verbal 
encouragement  was  provided.  Trials  were  timed  with  a 
stopwatch,  and  were  terminated  when  the  subject  abruptly 
dropped  the  pressure  or  when  50%  of  the  maximum  pres¬ 
sure  could  not  be  maintained. 

A  speech  sample  was  collected  from  all  38  sub¬ 
jects.  Subjects  described  the  Cookie  Theft  picture  from  the 
Boston  Diagnostic  Aphasia  Examination  (Goodglass  & 
Kaplan,  1983).  Later,  a  segment  of  each  speech  sample  was 
recorded  onto  another  tape  in  random  order.  Four  speech- 
language  pathologists  with  5  or  more  years  of  clinical 
experience  (not  the  investigators  involved  in  data  collec¬ 
tion)  rated  the  speech  samples  for  articulatory  precision  and 
overall  speech  defectiveness  on  a  six-point  scale  (0=nor- 
mal,  l=mild,  2=mild-to-moderate,  3=moderate,  4=moder- 
ale-to- severe,  5=severe).  Judgements  by  the  4  listeners 
were  averaged  to  provide  a  single  numeric  result  for  each 
speech  sample.  In  addition,  speech  rate  was  determined  by 


measuring  the  acoustic  waveform  with  the  C-Speech  soft¬ 
ware  program  for  personal  computers  (Mileakovic &Read, 
1992).  The  duration  of  speech,  with  the  exclusion  of  pauses 
>  250  ms,  was  determined.  The  number  of  syllables  was 
divided  by  the  duration  of  speech.  This  procedure  resulted 
in  a  measure  of  “interpause  speech  rate.”1 

Statistical  Analysis 

A  repeated  measures  multivariate  analysis  of  vari¬ 
ance  with  one  within-subjects  factor  was  used  to  analyze  the 
strength  and  endurance  data.  Two  variables  were  included 
in  the  analysis:  structure  (tongue  and  hand)  and  function 
(strength  and  endurance).  The  within-subjects  factor  was 
subject  group  (Parkinson  and  control).  "Ibis  analysis  al¬ 
lowed  for  paired  comparisons  of  matched  subjects.  The 
WUcoxon  signed-rank  test  was  used  to  test  for  differences 
between  paired  perceptual  judgements  of  speech  (equal 
judgements  for  a  pair  were  considered  missing  data;  a 
correction  was  conducted  for  tied  ranks).  Speech  rate 
between  pairs  of  subjects  was  compared  with  a  1-sample  t- 
test  for  paired  data  (2-tailed  probability).  For  all  analyses, 
a  probability  level  of  0.05  was  assigned. 

Results 

Measures  of  tongue  and  hand  strength  and  endur¬ 
ance  for  each  subject  are  provided  in  Table  2.  A  statistically 
significant  difference  between  the  subject  groups  was 
realized  when  all  variables  were  considered  [E(2,17)-4.393; 
5=0.029],  The  difference  was  due  to  strength 
(E(2,17)»4.645;  5*0.025],  not  endurance  GE(2,17)=0.359; 
5=0.704].  A  significant  difference  between  the  structures 
(hand  and  tongue)  was  found  for  both  strength  and  endur¬ 
ance  (H2,17>>121.9;  5=0.0001];  both  were  greater  for  the 


SUBJECT  PAIR 

Figure  1.  Differences  in  longue  strength  (pressure,  in  kPa)  plotted  for 
matchedpairs  of  subjects  (Controi-Paridnson).  Posithredifferences  indicate 
that  the  control  subjects  exhibited  greater  pressures  than  did  subjects  with 
Parkinson's  disease. 


’Measures  ot  speech  rata  that  exclude  pauaaa  may  provide  more 
mearfngU  information  tor  tha  speed  of  speech  articulation  than  if  pauaaa  ara 
included  (Alp,  1968;  Hammen,  1990;  Matter  &  Hanson,  1966;  T9  &  Goff,  1966). 
Recent  studiee  in  Partdnson's  disease  ham  excluded  pauses  tram  measure*  of 
speech  rata  (Hammen,  1990;  Solomon  &  Hbcon,  1993). 
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SUBJECT  PAIR 


Figure  X  Differences  m  tong ue  endurance  (deration,  in  i)  plotted  for 
matched  pain  of  subjects  ( Control- Parkinson).  Positive  differences 
dedicate  that  the  control  subjects  exhibited  greater  endurances  than  did 
objects  with  Parkinson’s  dinar*. 


hand.  However,  the  interaction  between  subject  group  and 
structure  was  not  significant  [E(2,17W).524;  p-0.601], 
indicating  that  the  structures  were  not  affected  differen¬ 
tially  for  the  subject  groups. 

Differences  between  matched  pairs  of  subjects  for 
tongue  strength  are  illustrated  in  Figure  1,  and  for  tongue 
endurance  in  Figure  2.  In  the  graphic  displays,  the  result 
for  the  subject  with  Parkinson’s  disease  was  subtracted 
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from  that  for  the  matched  control  subject  The  data  were 
ordered  in  terms  of  the  magnitude  and  directioa  of  the 
differences.  Therefore,  the  subject  order  is  different  for  the 
two  graphs.  For  tongue  strength  (Figure  1),  it  is  clear  that 
most  of  the  data  are  positive  (i.e„  the  control  subject  had 
greater  tongue  strength  than  the  matched  subject  with 
Parkinson's  disease).  For  tongue  endurance  (Figure  2),  the 
data  are  more  evenly  split  between  positive  and  negative 
differences. 

The  results  for  the  analyses  of  speech  are  provided 
in  Table  3.  Average  judgements  by  the  speech-language 
pathologists  indicated  that  articulatory  imprecision  and 
overall  speech  defectiveness  were  not  present  (i.e,  were 
normal)  or  were  mild  for  all  but  2  subjects  with  Parkinson’ s 
disease  (Parkinson  subjects  R  and  S).  Articulatory  impre¬ 
cision  was  slightly  but  significantly  greater  for  the  subjects 
with  Parkinson’s  disease  (¥*33,  u*15;  Z*-165,  £*0.049, 
after  correction  for  tied  ranks).  No  difference  was  revealed 
for  overall  speech  defectiveness  (¥*63.  n*17;  2*-0.88, 
p-0.188  after  correction). 

Inspection  of  Figures  1  and  2  reveals  that  severity 
of  disease  (Table  1)  did  not  relate  systematically  with 
measures  of  strength  or  endurance  for  these  subjects  (Note 
placement  of  Subjects  A,  B,  and  C,  the  most  mildly 
affected,  and  Subjects  R  and  S,  the  most  severely  affected, 
on  the  graphs.).  Similarly,  severity  of  speech  disorder 
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Figure  3.  Tongue  stroigth  ( maximal  praju  re) for  subjects  with  Parkinson  's 
ditto*  (P)  and  control  objects  (C)  plotted  against  average  perceptual 
judgements  of  articulatory  imprecision  (A)  and  overall  speech  deficiency 
(B);  0  =  normal,  l  *  mild,  2  =  mild-to-modenMe,  3  =  moderate. 
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ARTICULATORY  IMPRECISION  OVERALL  SPCCCH  DEFICIENCY 

Figure  4.  Tongue  endurance  at  50%  maximal  pressure  for  subjects  with 
Parkinson  s  disease  (P)  and  control  subjects  (C)  plotted  against  average 
perceptual  judgements  of  articulatory  imprecision  (A)  and  overall  speech 
deficiency  (B);  0  =  normal,  1  *  mild,  2  *  mild-to-moderate,  3  *  moderate. 
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(Table  3)  bore  no  obvious  relation  to  tongue  strength 
(Figure  3)  sod  endurance  (Figure  4).  One  exception  to  these 
observations  is  Parkinson  Subject  R’s;  be  was  one  of  the 
most  severely  affected  subjects  and  tbe  differences  between 
his  data  and  bis  matched  control  subject's  data  for  tongue 
and  band  strength  and  endurance  are  among  the  greatest 
(Table  2,  Figures  1  and  2). 

Interpause  speech  rate  (Table  3)  did  uot  differ 
between  subject  groups  (l(18)-1.3179;  p-0.204).  Speech 
fate  is  plotted  against  tongue  strength  in  Figure  3A  and 
tongue  endurance  in  Figure  3B.  Graphic  inspection  of  these 
plots  reveals  no  obvious  relations  for  these  subjects. 

A  B 


p 

• 

P 

p  p 

P  P 

c  c 

cc 

'  -cCC  * 

*  cl; 

s 

-  c;  - 

»  \ 

c<f  C 

» ' 

,  fc'c  * 

p  % 

P  CP 

K  '  «C 

4 

p 

c  j 

P  c  c 

C 

c 

3 

C 

I 

0  20  40  90  80  100  Q  20  40  90 

PRESSURE  (kPn)  ENDURANCE  (a) 

Figure  S.  Interpaaae  speech  nne(syiiahise/s)for  subjects  with  Parkinson  s 
Usama  (P)  <md  control  subjects  (Q  plotted  again*  tongas  strength 
(maximal  pressure)  and  tongue  endurance  at  50%  maximal  pressure. 


Discussion 

Subjects  with  mild  to  moderate  Parkinson’s  dis¬ 
ease  were  found  on  average  to  have  less  strength  in  the 
tongue  and  hand  than  19matcbed  control  subjects.  Strength 
was  detenninedby  the  maximum  pressure  exerted  on  an  air- 
filled  bulb.  Despite  the  contribution  ofSubject  Pair  R  to  the 
large  difference  in  tongue  strength  between  subject  groups 
(Parkinson  Subject  R  bad  the  lowest  and  Control  Subject  R 
had  the  greatest  tongue  strength  of  all  subjects),  the  differ¬ 
ence  appears  to  be  real  for  the  subject  groups  in  general. 
Previous  research  generally  has  indicated  that  limb  isomet¬ 
ric  strength  (i.e.,  maximal  force,  torque,  or  pressure  gener¬ 
ated  during  a  single  maximal  voluntary  contraction)  is  not 
reduced  in  Paridnson’sdiseasefKoller&Kase,  1986;  Saltin 
&  Landis,  1975;Tzelepisetal.,  1988;  for  an  exception,  see 
Yanagawa  et  al.,  1990).  However,  weakness  has  been 
reported  in  the  tongue  (Dworkin  &  Aronson,  1986)  and  lips 
(Netsell  et  al.,  1975;  Wood  et  al.,  1992). 

Endurance,  defined  in  this  report  as  the  maximum 
duration  for  maintaining  50%  of  the  maximum  pressure,  did 
not  differ  systematically  between  the  subject  groups.  This 
finding  was  unexpected  because  people  with  Parkinson's 
disease  often  complain  of  fatigue.  This  perception  of 
fatigue  may  relate  to  tbe  muscle  weakness  demonstrated  for 
these  subjects;  muscle  weakness  corresponds  with  an  in¬ 


creased  sense  of  effort  (Gande  via.  1982;McCloskey,  1981), 
a  critical  component  of  fatigue  (Edwards,  1981;  Enoka  & 
Smart.  1985).  In  addition,  weakness  may  provide  a  possible 
explanation  for  the  finding  of  normal  endurance.  That  is, 
the  target  pressure  during  the  endurance  task  would  be 
lower  in  the  subjects  with  Parkinson's  disease  than  in  the 
control  subjects.  Because  of  tbe  lower  target  pressure,  the 
task  may  require  less  effort  and  recruitment  of  mote  fatigue- 
resistant  motor  units  than  when  strength  is  normal.  Roller 
&  Kase’s  (1986)  finding  of  greater  than  normal  endurance 
values  in  subjects  with  Parkinson’s  disease  is  consistent 
with  this  hypothesis. 

The  inability  to  maintain  a  target  contraction  is 
due  to  a  process  termed  “force  failure.”  Force  failure  can 
result  from  central  or  peripheral  sites  in  tbe  motor  system. 
As  described  previously,  Yanagawa  et  al.  (1990)  used 
voluntary  con  tractions  and  nerve  stimulation  to  demon¬ 
strate  that  muscle  weakness  in  Parkinson’s  disease  may  be 
due  to  central  and  not  peripheral  mechanisms.  Measures  for 
the  assessment  of  central  fatigue  may  provide  critical 
information  for  the  understanding  of  fatigue  in  Parkinson’s 
disease.  We  have  been  investigating  perceptions  of  effort 
in  an  attempt  to  elucidate  centra]  fatigue  in  normal  and 
neurologically  disordered  individuals  (Solomon  et  al.,  in 
press;  Solomon,  Robin,  Dorothy,  &  Luschei,  1992;  Somodi. 
Robin,  &  Luschei,  1993).  Thus  Car,  our  research  suggests 
that  neurologically  normal  young  adults  can  accurately  and 
consistently  perceive  various  levelsof  effort  (Somodi  etal., 
1993)  such  that  low  pressures  are  generated  at  low  effort 
levels  and  progressively  higher  pressures  are  generated  at 
higher  effort  levels.  In  addition,  healthy  young  adults  can 
maintain  a  constant  sense  of  effort  that  corresponds  with  an 
exponential  decline  in  pressure  (Solomon  et  al.,  1992). 
Based  on  the  results  of  the  present  investigation,  we  exam¬ 
ined  the  ability  of  2  of  the  subjects  with  Parkinson’s  disease 
(Subjects  P-L  and  P-Q)  and  an  additional  subject  with 
parkinsonism  to  perform  tasks  related  to  the  perception  of 
effort  (Solomon  et  al.,  in  press).  Each  of  these  subjects 
complained  of  fatigue.  Although  tbe  disordered  subjects 
could  maintain  constant  effort  levels  in  a  manner  compa¬ 
rable  to  neurologically  normal  subjects  (Solomon  et  al., 
1992),  they  bad  difficulty  producing  pressures  that  corre¬ 
sponded  to  various  effort  levels  (Solomon  et  al.,  in  press). 
We  are  continuing  to  explore  strength,  endurance,  and 
perceptions  of  effort  in  a  study  of  peopic  with  moderate  to 
severe  Parkinson’s  disease. 

Tbe  relation  between  tongue  strength  and  endur¬ 
ance  (assessed  with  non-speech  tasks)  and  speech  is  un¬ 
clear.  Most  of  the  subjects  in  the  present  investigation, 
including  tbe  control  subjects,  were  judged  to  have  normal 
or  mildly  disordered  speech.  In  fact,  judgements  for  overall 
speech  defectiveness  and  measures  of  interpause  speech 
rate  did  not  differ  between  the  two  subject  groups,  and 
articulatory  imprecision  was  only  slightly  greater  for  tbe 
subjects  with  Parkinson’s  disease. 
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The  2  subjects  who  were  judged  as  having  mildly 
to  moderately  impaired  speech  (Parkinson  Subjects  R  and 
S)  had  lower  than  normal  league  strength  sod  endurance: 
tongue  function  results  were  especially  abnormal  for  Sub¬ 
ject  R.  In  addition,  their  interpause  speech  rates  were 
among  the  fastest.  This  last  observation  contrasts  with  a 
speculation  we  advanced  hi  a  previous  paper,  that  increased 
speech  rate  may  correspond  with  better  tongue  endurance 
in  Pvkiasou’s  disease  (Solomon  etaL,  in  press).  We  based 
this  hypothesis  oa  the  finding  that  “supernormal"  speakers 
(debaters)  with  fast  speech  rates  (M“414  wordsrinin,  or 
approximately  8.9  syllaUes/s)  had  supernormal  tongue 
enduranoe(approxiinaaeiy  93  sat  50%  of  maximal  pressure; 
Robin  etaL.  1992).  It  should  be  clarified  that  the  debaters 
and  control  subjects  in  tbat  study  were  asked  to  speak  as  fast 
as  possible  while  maintaining  intelligibility,  hi  contrast, 
the  present  subjects’  speech  rate  was  determined  Grom  a  task 
of  habitual  speech  rate.  Apparently,  substantially  more  data 
are  needed  to  assess  whether  a  relationship  between  speech 
rate  and  tongue  endurance  bolds  for  speech  disordered 
populations. 

Although  individual  subjects  can  be  described 
whose  data  appear  to  support  the  expectation  of  impaired 
tongue  function  co-occurring  with  disordered  speech,  this 
relationship  clearly  cannot  be  addressed  by  the  current  data. 
Future  studies  of  larger  groups  of  subjects,  subjects  with 
mote  severely  affected  speech,  and  changes  in  speech  and 
non-speech  tongue  fraction  over  time  may  clarify  this 
issue. 
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Abstract 

Empirical  orthogonal  eigenfunctions  are  extracted 
from  biomechanical  simulations  of  normal  and  chaotic 
vocal  fold  oscillations.  For  normal  phonation,  two  domi¬ 
nant  empirical  eigenfunctions  capture  the  vibration  pat¬ 
terns  of  the  folds  and  exhibit  a  1:1  entrainment  The 
eigenfunctions  show  some  correspondence  to  theoretical 
low-order  normal  modes  of  a  simplified,  three-dimensional 
elastic  continuum,  and  to  the  normal  modes  of  a  linearized 
two-mass  model.  The  eigenfunctions  also  facilitate  a 
physical  interpretation  of  energy  transfer  mechanisms  in 
vocal  fold  dynamics.  Subharmonic  regimes  and  chaotic 
oscillations  are  observed  during  simulations  of  a  lax  cover, 
in  which  case  at  least  three  empirical  eigenfunctions  are 
necessary  to  capture  the  resulting  vocal  fold  oscillations. 
These  chaotic  oscillations  might  be  understood  in  terms  of 
a  desynchronization  of  a  few  of  the  low-order  modes,  and 
may  be  related  to  mechanisms  of  creaky  voice  or  vocal  fry. 
Furthermore,  some  of  the  empirical  eigenfunctions  cap¬ 
tured  during  complex  oscillations  correspond  to  higher- 
order  normal  modes  described  in  earlier  theoretical  work. 
The  empirical  eigenfunctions  may  also  be  useful  in  the 
design  of  lower-order  models  (valid  over  the  range  for 
which  the  empirical  eigenfunctions  remain  more  or  less 
constant),  and  may  help  facilitate  bifurcation  analyses  of 
the  biomechanical  simulation. 


Introduction 

With  any  model  of  a  physical  or  physiological  process, 
there  is  always  a  trade-off  between  simplicity  and  com¬ 
pleteness.  The  model  should  be  simple  enough  to  be  useful 
in  conceptualization  and  prediction,  but  also  complete 
enough  to  represent  the  process  accurately. 

This  certainly  applies  to  vocal  fold  models.  Early 
one-mass  and  two-mass  models  (Flanagan  &  Landgraf, 
1968;  Ishizaka  £  Flanagan,  1972)  were  simple  enough  to  be 
described  in  a  few  pages  of  print  They  were  elegant  in  that 
they  helped  conceptualize  the  interaction  between  airflow 
and  tissue  movement  to  produce  self-oscillation.  But  there 
is  considerable  doubt  that  they  represented  the  geometry 
and  the  viscoelastic  properties  of  the  vocal  folds  adequately 
for  the  study  of  voice  disorders  or  special  vocal  qualities. 
More  recent  models  by  Titze  and  TaDrin  (1979),  and  Titze 
and  Alipour  (in  review)  have  enough  biomechanical  detail 
to  model  the  three-dimensional  layered  structure  of  vocal 
fold  tissue,  but  a  heavy  price  is  paid  in  terms  of  mathemati¬ 
cal  complexity  and  speed  of  computation.  Furthermore, 
interpreting  the  dynamics  of  such  intensive  descriptions  of 
the  vocal  folds  can  be  a  formidable  task,  particularly  if 
irregular,  chaotic  vibrations  occur  (Titze,  Baken  &  Herzel, 
1993). 

One  way  to  facilitate  the  physical  interpretation  of 
a  vibrating  structure  is  to  calculate  its  principal  modes  of 
vibration.  Sometimes,  even  complicated  vibration  patterns 
can  be  explained  by  a  relatively  small  number  of  orthogonal 
modes. 
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The  paper  begins  with  an  introduction  to  modal  • 
analysis  of  vocal  fold  tissues,  followed  by  a  brief  descrip¬ 
tion  of  the  biomechanical  simulation  used  in  this  investiga¬ 
tion  (Titze  &  Alipour,  in  review).  Next  a  set  of  empirical 
eigenfunctions  extracted  from  simulations  of  both  normal 
and  chaotic  vocal  fold  oscillations  is  presented.  The 
physical  significance  of  these  eigenfunctions  is  discussed 
and  related  to  current  theories  of  voice  production  and 
nonlinear  dynamics  (e.g.  airflow-tissue  energy  transfer 
mechanisms  and  desynchronisation  mechanisms).  The 
empirical  eigenfunctions  are  also  compared  with  the  modes 
captured  by  the  two-mass  model  (Ishizaka  &  Flanagan, 
1972)  and  to  modes  predicted  for  a  simplified  elastic 
continuum  (Titze  &  Strong,  1975).  Finally,  future  direc¬ 
tions  of  research  using  this  procedure  are  discussed. 

Modal  Analysis  of  the  Vocal  Folds 

Modal  analysis  is  a  basic  technique  used  to  analyze 
many  vibrating  structures.  Traditionally,  it  refers  to  the 
process  of  determining  the  normal  (natural)  modes  and 
frequencies  of  a  linear  (or  linearized)  system.  It  is  a 
powerful  technique  because  it  provides  a  framework  in 
which  a  system  can  be  decomposed  into  a  set  of  independent 
vibration  patterns,  each  with  a  characteristic  (although  not 
necessarily  unique)  frequency.  Experimentally,  these  nor¬ 
mal  modes/frequencies  can  be  observed  immediately  after 
a  system  is  pulse  excited,  or  during  a  forced,  sinusoidal 
excitation  (provided  the  driving  frequency  coincides  closely 
enough  with  one  of  the  systems’  natural  frequencies).  One 
of  the  major  limitations  of  this  technique  is  that  it  is  only 
valid  for  linear  systems.  However,  in  practice  many 
systems  are  approximately  linear  for  small-amplitude  os¬ 
cillations. 

Theoretical  Normal  Modes 

The  concept  of  normal  modes  and  frequencies  is 
not  new  to  speech  science.  For  example,  formants  are 
frequencies  which  correspond  to  the  normal  modes  of  the 
vocal  tract  They  have  been  discussed  extensively  in  terms 
of  vowel  production.  The  same  concepts  have  not  been 
exploited  to  the  same  degree  for  an  understanding  of  vocal 
fold  movement  although  the  study  of  normal  modes  in 
vocal  fold  tissues  does  have  its  beginnings.  Almost  two 
decades  ago,  Titze  and  Strong  (1975)  theoretically  deter¬ 
mined  normal  modes  of  the  vocal  folds.  By  examining  a 
single  fold  and  treating  it  as  a  three-dimensional,  elastic, 
compressible  medium,  and  by  assuming  a  rectangular 
parallelepiped  with  simple  boundary  conditions  (anterior, 
posterior,  and  lateral  boundaries  fixed;  medial,  superior, 
and  inferior  boundaries  free),  normal  modes  were  ex¬ 
pressed  in  terms  of  elementary  sines  and  cosines.  For 
comparison  with  empirical  modes  to  be  shown  later,  the 
theoretical  modes  are  reviewed  and  briefly  discussed.  The 


lateral  displacements  %  of  the  x-modes  are  given  by: 


,  (2nx-l)wx  .  n  ary  n  jtz 
((x.y.z.r)  =  i4exp  cos - sin— — cos— — 


2D 


(1) 


and  the  vertical  displacements  C  of  the  z-modes  are  given 
by: 


i  (2n  -l)jrx  nny  n  nz 
Z(x,y,z,t )  =  Bexp  '  cos - — - sm~L[~  cos“^r-  (2) 


where  nz,  n  ,  and  nt  are  integers  indicating  the  order  of  the 
modes;  L,  D,  and  Tan  the  length,  depth  and  thickness  of  the 
folds,  respectively;  A  and  Bare  arbitrary  constants;  and 
(D1  and  coi  are  the  radian  frequencies  of  vibration.  Any 
possible  y -displacements  (anterior-posterior  direction)  are 
neglected.  This  is  based  on  experimental  evidence  that  the 
trajectories  of  vocal  fold  fleshpoints  are  mostly  planar 
(Baer,  1981;  Saito,  Fukuda,  Isogai  &  Ono,  1981;  Saito, 
Fukuda,  Kitahira,  Isogai,  Tsuzuki,  Muta. Takyama.  Fujika, 
Kokawa  &  Makino  1985). 

In  order  to  distinguish  the  modes,  the  order  indices 
(«,» and  n)  must  be  specified  and  the  modes  need  to  be 
identified  as  either  x  or  z  modes  (the  assumption  of  com¬ 
pressible  tissue  allows  the  decoupling  of  such  modes).  In 
practice,  the  nz  index  is  usually  not  specified  because  the 
standing  wave  pattern  governed  by  nz  (the  first  cosine  term) 
is  assumed  to  be  constant  (e.g.,  the  likelihood  of  reflections 
from  the  fixed  lateral  boundary  is  small  because  of  high 
attenuation  in  the  thyroarytenoid  muscle).  Thus,  following 
nomenclature  introduced  previously  (Titze  &  Strong,  1975; 
Titze,  1976;  Titze,  1988),  the  modes  are  designated  as  either 
x~nfnt  or  z-nnt  modes.  Conceptually,  the  ny  and  nt  indices 
indicate  how  many  half-wavelengths  occur  along  the  lon¬ 
gitudinal  and  vertical  dimensions,  respectively. 

A  few  of  the  lower-order  modes  are  shown  in 
Figure  1.  Figure  la  shows  a  superior  view  (upper)  and 
coronal  view  (lower)  of  the  x-10  mode.  From  the  superior 
view,  the  commonly  observed  oval  glottis  is  displayed. 
From  the  coronal  view,  all  the  lateral  tissue  displacements 
are  in  phase  along  the  vertical  dimension.  An  x-11  mode  is 
displayed  in  Figure  lb.  In  the  coronal  view,  the  tissue  at  the 
top  of  the  folds  is  180  degrees  out  of  phase  with  the  tissue 
at  the  bottom  of  the  folds.  Variations  of  these  lowest-order 
x  modes  describe  some  of  the  most  commonly  observed 
vocal  fold  vibration  patterns  (Moore  &  Von  Leden,  1958; 
Hirano,  1975).  Indeed,  an  appropriate  combination  of  these 
inodes  is  known  to  be  essential  for  self-oscillation  of  the 
folds  (Titze,  1988).  An  x-21  mode  is  illustrated  in  Figure 
lc,  and  a  z-10  mode  in  Figure  Id  (sagittal  view  on  top). 
These  modes  are  not  as  easily  observed  because  (1)  the 
superior  aspect,  which  is  almost  always  used  in  high-speed 
films  and  videostroboscopy,  is  not  ideal  for  viewing  z- 
modes  (saggital  or  coronal  views  would  be  better),  and  (2) 
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Ftgun  I.  A  few  of  the  Unv-otder,  theoretical,  normal  modes  art  ikemm  from  typing  rand  coronal  vim**:  (a)x-lQ.  (h)x-Jl,  (c)x-2J.  A  saggdelcmd  coronal 
view  b  shown  for  (d)  t-10.  Ah  artificial  separation  of  left  and  rig*  frtdsis  rued  in  order  to  dUjdajt  dm  true  theoretic*!  mode*  without  deformation  from 


higber-onler  modes  (such  as  the  x-21  mode)  usually  have  has  been  observed  over  a  wide  range  of  parameters  in  the 
«mnli«»r  vibrational  amplitudes  and  are  often  not  excited.  two-mass  model  (Herzel,  Steinecke,  Mende  &  Wennke, 

1991).  As  predicted  by  Titze  (1976),  this  entrainment 
Normal  Modes  In  Low-Order  Models  occurs  at  a  frequency  very  close  to  the  natural  frequency  of 

Not  long  after  these  normal  modes  were  intro-  the  lowest-oider  mode.  Over  a  certain  range  of  parameters 
duced,  Utze  (1976)  claimed  that  (1)  self-oscillation  of  the  (e.g.,  those  corresponding  to  low  stiffness  of  the  upper  mass 

vocal  folds  consists  of  “approximately  linear  combinations  and  weak  coupling  of  the  masses),  the  breakdown  or 

of  the  normal  modes,**  and  (2)  that  “self-oscillation  ...  desynchronization  of  this  1:1  entrainment  has  also  been 
occun  at ...  one  of  the  natural  frequencies  of  oscillation.  observed  in  the  two-mass  model  (Herzel  et  aL,  1991).  In 

usually  the  lowest”  Titze  demonstrated  the  plausibility  of  such  parameter  regions,  various  nonlinear  phenomena  have 

these  concepts  through  an  analysis  of  the  two-mass  model  been  observed  including  subbaimonic  regimes,  beating- 
(bhizaka  and  Flanagan,  1972).  The  normal  modes  of  the  like  toroidal  oscillations  (i.e..  low-frequency  modulations), 

two-mass  model  were  shown  to  be  analogous  to  the  lowest  and  chaotic  motion  (Herzel  et  al.,  1991). 

order  x-inodes  of  the  simplified  elastic  continuum;  that  is,  Subbarmonics,  low-frequency  modulations,  and 

the  mode  where  the  two  masses  are  in  phase  is  similar  to  the  chaos  are  also  commonly  observed  in  patients  with  vocal 

x-10  mode,  and  the  mode  where  the  two  masses  are  180  disorders  (Herzel  &  Wendin’,  1991;  Bakea,  1991;  Herzel, 
degrees  out  of  phase  is  similar  to  the  x- 1 1  mode.  The  ability  Berry,  Titze  &  Saleh,  in  review)  and  during  infant  cries 

of  the  two-mass  model  to  self-oscillate  can  be  explained,  in  (Mende,  Herzel  &  Wennke  1990).  Consequently,  this 

large  measure,  by  the  existence  of  these  two  modes,  which  desynchronization  of  the  modes  is  believed  to  be  an  essen- 

facilitate  energy  transfer  from  the  airflow  to  the  tissue  tial  mechanism  ofmany  vocal  disorders  (Titze  etal.,  1993; 

(Stevens,  1977;  Broad,  1979;  Titze,  1988).  For  “typical”  Herzel  et  aU  in  review). 

Ishizaka  and  Flanagan  (1972)  parameters,  the  natural  fre¬ 
quencies  of  the  normal  modes  are  120  Hz  and  201  Hz,  Experimental  Studies  of  Normal  Modes 
respectively  (Titze,  1976).  To  date,  most  of  the  discussion  of  normal  modes 

in  vocal  fold  tissues  has  been  in  a  theoretical  sense.  Direct 
Entrainment  of  the  Modes  measurement  of  the  modes  has  proven  problematic,  par- 

Self-oscillation  during  normal  pfaonation  also  in-  daily  because  of  the  small  dimensions  of  the  vocal  folds  (on 

volves  a  1:1  “entrainment”  of  the  modes.  Entrainment  is  a  the  order  of  1  on).  Traditional  modal  analysis  in  which 

phenomenon  in  which  a  nonlinear  coupling  of  system  accelerometers  are  used  to  trace  trajectories  at  various 

variables  causes  the  natural  frequencies  of  the  system  to  locations  on  a  structure  would  undoubtedly  yield  unsatis- 

shift  so  as  to  be  related  by  an  integer  ratio.  For  example,  a  factory  results.  The  number  of  accelerometers  that  could  be 

1:1  entrainment  in  the  two-mass  model  means  that  both  placed  cm  the  folds  would  be  limited,  the  weight  of  accel- 

modesoscillateat  the  same  frequency.  Such  an  entrainment  erometers  might  alter  the  modes  significantly,  and  the 
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ability  to  fiimly  attach  a  device  to  die  elastic  tissue  of  the 
folds  would  be  limited. 

Perhaps  new  optical  techniques  and  high-speed 
video  show  the  greatest  promise  for  an  adequate  modal 
analysis  of  the  folds  to  be  performed.  Indeed,  the  most 
common  way  to  observe  vocal  fold  vibrations  is  through 
high-speed  films  and  videostroboscopy.  As  noted  previ¬ 
ously,  many  vibration  patterns  similar  to  the  theoretical 
normal  inodes  have  been  observed  using  these  techniques. 
However,  vibration  patterns  are  usually  a  combination  of 
several  modes,  so  the  vibration  patterns  observed  are  rarely 
oscillations  of  a  single  mode.  Still,  if  these  optical  tech¬ 
niques  can  be  used  to  trace  trajectories  of  vocal  fold 
fleshpoints,  empirical  methods  (e.g.,  such  as  the  method  of 
empirical  eigenfunctions  to  be  described)  can  be  used  to 
decompose  the  vibration  patterns  into  distinct,  orthogonal 
modes  of  vibration. 

Although  little  experimental  work  has  been  done 
to  quantify  the  normal  modes  of  the  vocal  folds,  some 
attempts  have  been  made  to  measure  the  resonant  frequen¬ 
cies  of  the  folds  (Kaneko,  Masuda,  Shimada,  Suzuki, 
Hayasaki  &  Komatsu,  1986).  Kaneko  et  aL  (1986)  used 
ultrasonic  techniques  to  measure  these  resonant  frequen¬ 
cies.  Measurements  were  made  immediately  following  a 
mechanical  pulse-excitation  to  the  thyroid  cartilage,  and 
immediately  before  the  patients  began  phonating.  Kaneko 
etal.  (1986)  observed  two  dominant  resonances  of  the  folds 
below  625  Hz.  Interestingly,  the  lower  resonance  corre¬ 
sponded  well  to  the  pbonation  frequency  that  followed. 

Modes  In  the  Biomechanical  Simulation 

From  the  start,  it  is  acknowledged  that  there  is  no 
substitute  for  direct  measurement  of  the  modes  in  vocal  fold 
tissues.  However,  until  this  becomes  feasible,  there  are 
additional  theoretical  approaches  that  can  be  used  to  inves¬ 
tigate  these  modes,  particularly  with  the  help  of  a  biome¬ 
chanical  simulation  of  vocal  fold  movement  (Titze  and 
Alipour,  in  review).  The  simulation  uses  a  finite  element 
approach  to  the  solution  of  viscoelastic  waves  in  a  con¬ 
tinuum  (Titze  and  Talkin,  1979).  A  series  of  experimental 
studies  have  been  performed  to  quantify  the  elastic  proper¬ 
ties  of  vocal  fold  tissues  (Alipour  &  Titze  1985, 1991),  and 
more  work  in  this  area  is  in  progress.  Indeed,  the  develop¬ 
ment  of  this  simulation  has  been  an  effort  to  integrate  many 
independent  measurements  and  theoretical  considerations 
into  one  coherent  “picture’*  of  vocal  fold  vibration. 

Furthermore,  with  the  simulation  it  is  easy  to  trace 
the  trajectories  of  points  interior  to  the  folds.  With  optical 
methods,  only  surface  points  can  be  tracked  (Baer,  1981). 
Nevertheless,  direct  measurement  of  the  modes  in  vocal 
fold  tissues  (perhaps  with  x-ray  pellet  techniques  as  de¬ 
scribed  by  Saito  etal.,  1985)  will  be  invaluable,  and  may  be 
a  key  to  future  refinements  of  the  biomechanical  simula¬ 
tion. 


Empirical  Eigenfunctions 

As  pointed  out  earlier,  traditional  modal  analysis 
is  limited  by  the  fact  that  it  is  only  valid  for  linear  systems. 
However,  there  are  many  non  linearities  associated  with  the 
vocal  folds.  One  of  these  is  the  nonlinear  stress-strain 
curves  of  vocal  fold  tissues.  Another  is  the  nonlinear 
pressure-flow  relation  in  the  glottis.  A  third  is  the  nonlinearity 
associated  with  vocal  fold  collision.  While  for  small 
transient  oscillations  these  nonlinearities  might  be  ne¬ 
glected,  self-sustained  oscillation  depends  critically  on  at 
least  one  nonlinear  constitutive  equation.  Indeed,  for  many 
vocal  fold  configurations  linear  dynamics  is  not  even  ap¬ 
proximately  true. 

However,  the  method  of  empirical  orthogonal 
eigenfunctions  (Lorenz,  1956)  has  been  used  for  many  years 
to  extract  physically  meaningful  structures  Grom  nonlinear 
systems.  For  example,  Lumley  (1967)  advocated  the  tech¬ 
nique  as  a  way  to  extract  “coherent  structures”  from  a 
turbulent  flow.  In  recent  years,  the  method  has  become  a 
popular  technique  in  a  variety  of  problems  in  fluid  dynamics 
(Sirovich,  1987;  Aubry,  Guyonnet  &  Lima,  1991;  Deane, 
Kevreltidis,  Kamiadakis  &  Orszag,  1991;  Armbruster, 
Heiland,  Kostelich  &  Nicolaenko,  1992).  The  method  is  an 
application  of  a  general  technique  familiar  to  many  disci¬ 
plines,  and  has  also  been  referred  to  as  the  singular  value 
decomposition  (Golub  &  Van  Loan,  1983),  singular  spec¬ 
trum  analysis  (Vautard,  Yiou  &  Ghil,  1992),  principal- 
components  analysis  (Zahorian  &  Rothenberg,  1981),  prin¬ 
cipal  factor  analysis  (Johnson  &  Wichera,  1982),  the 
Karhunen-Lohve  expansion  (Fukunaga,  1972),  the  proper 
orthogonal  decomposition  (Lumley,  1967),  and  the  bi- 
orthogonal  decomposition  (Aubry,  Guyonnet  &  Lima,  1991). 

Furthermore,  Breuer  and  Sirovich  (1991)  have 
recently  shown  that,  fora  general  class  of  linear  systems,  the 
empirical  eigenfunctions  actually  reduce  to  the  linear  nor¬ 
mal  modes.  The  ability  to  extract  physically  meaningful 
structures  from  both  linear  and  nonlinear  systems  makes  the 
method  of  empirical  orthogonal  eigenfunctions  a  particu¬ 
larly  useful  tool  for  analyzing  vocal  fold  movement.  In  the 
case  of  small-amplitude  vibrations  for  which  the  tissue 
stress-strain  curves  are  approximately  linear,  the  empirical 
eigenfunctions  should  be  related  to  the  normal  modes  of 
vocal  fold  tissues.  For  larger  amplitude  vibrations  for  which 
tissue  nonlinearities  become  important,  the  eigenfunctions 
should  appear  as  distortions  of  the  normal  modes,  i.e.,  a 
reflection  of  the  new  nonlinear  phenomena  (Breuer  and 
Sirovich,  1991). 

Moreover,  the  statistical  nature  of  this  technique 
makes  it  we  11- suited  for  the  present  investigation.  In  a  sense, 
the  method  is  “blind”  to  all  the  complexities  of  the  biome¬ 
chanical  simulation  (e.g.,  nonlinearities  in  stress-strain 
curves,  complex  geometry  of  the  folds,  layered  tissue 
structure,  tissue  incompressibility  which  induces  a  coupling 
between  lateral  and  vertical  modes,  aerodynamic  forces. 
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collision  forces,  viscous  kisses).  Such  complexities  forbid 
so  analytical  solution  of  the  modes,  but  present  no  difficul¬ 
ties  for  the  method  of  empirical  eigenfunctions. 

The  method  of  empirical  orthogonal 
eigenfunctions  differs  from  a  traditional  normal  mode 
analysis  in  that  it  don  not  determine  “modes”  directly  from 
the  equations  of  motion.  Rather,  “modes”  are  determined 
by  statistical  correlations  of  the  output  variables,  i.e.,  a 
covariance  matrix  is  generated  and  eigenvectors  are  com¬ 
puted.  The  eigenvectors  are  orthogonal  and  are  guaranteed 
to  be  optimal  in  the  sense  that  they  regenerate  the  output 
data  with  minimum  least-square  error  (for  any  arbitrary 
number  of  eigenvectors).  Unlike  a  normal  mode  analysis, 
the  method  of  empirical  eigenfunctions  does  not  calculate 
all  the  possible  modes  of  a  system.  Rather,  it  only  extracts 
those  “modes”  which  are  excited.  For  the  present  investi¬ 
gation,  the  excited  “modes”  are  the  focus  of  in  terest,  and  are 
used  as  a  tool  for  interpreting  vocal  fold  dynamics  during 
self-osciT  ion. 

Procedu^ 

Trajectories  from  the  Simulation 

Empirical  eigenfunctions  were  calculated  based 
on  the  output  of  a  biomechanical  simulation  of  vocal  fold 
movement  (Titze  and  Alipour,  in  review).  The  simulation 
was  run  as  part  of  a  complete  speech  synthesis  system, 
including  sub-  and  supraglottal  systems.  The  biomechani¬ 
cal  model  of  the  folds  consists  of  nine  longitudinal  layers  as 
shown  in  Fig.  2,  where  the  posterior  edge  of  the  folds  is  in 
the  foreground.  Anterior  and  posterior  boundaries  are 
fixed.  Each  layer  consists  of  32  finite  elements  (triangles) 


Figure  2.  A  view  of  the  biomechanical  simulation  immediately  before 
glottal  closure  is  shown,  with  the  posterior  edge  of  the  folds  in  the 
foreground.  There  are  nine  layers  positioned  along  the  anterior-posterior 
length. 


or  26  nodes  (fleshpoints),  as  shown  on  the  left  side  of  Fig. 
3.  The  elements  which  correspond  to  the  body  (or  muscle) 
are  marked  “B”,  the  elements  which  correspond  to  the 
ligament  are  "iaHr«t  “L”,  and  the  elements  which  corre¬ 
spond  to  the  cover  (or  mucosa)  are  marked  “C”.  Each  of 
these  regions  possesses  distinct  elastic  properties.  Three 
nodes  per  layer  are  placed  on  a  fixed  lateral  boundary.  Thus, 
there  are  207  nodes  per  fold  (9  layers  x  23  nodes/layer) 
which  are  free  to  oscillate.  As  in  earlier  investigations, 
lateral  and  vertical  motions  are  allowed,  but  no  movement 
along  the  anterior-posterior  direction.  With  two  degrees  of 
freedom  per  node,  there  are  414  total  degrees  of  freedom  if 
left  and  right  folds  are  symmetric,  and  828  if  asymmetric. 
Although  the  simulation  is  equipped  to  handle  asymmetric 
folds,  all  the  runs  for  this  analysis  employed  left-right 
symmetry. 

Nodal  trajectories  for  parameters  corresponding 
to  normal  pbonation  are  shown  on  the  right  side  of  Fig.  3. 
The  trajectories  are  taken  from  the  fifth  longitudinal  layer 
(the  layer  mid-way  between  anterior-posterior  boundaries), 
which  is  the  layer  with  the  most  lateral  movement  Quali¬ 
tative  similarities  exist  between  these  trajectories  and 
fieshpoint  trajectories  observed  experimentally  (Baer,  1981; 
Saitoetal.,  1981;SaitoetaL,  1985).  Thejcandzcoordinates 
from  the  trajectories  of  each  of  the  207  nodes  were  used  as 
the  input  for  calculating  the  covariance  matrix,  and  the 
resulting  eigenfunctions.  Although  the  simulation  was  run 
at  a  sampling  rate  of  20  kHz,  the  nodal  coordinates  were 
only  saved  at  a  rate  of  5  kHz,  which  was  found  to  be 
sufficient.  Frequencies  above  about  1  kHz  were  essentially 
non-existent  in  the  trajectories  (measured  on  a  power 
spectrum,  they  were  at  least  40  dB  below  the  strongest 
frequency). 


Figure  3.  A  coronal  view  of  the  fifth  longitudinal  layer.  On  the  left  side  of 
the  figure,  the  32  elementsAayer  are  displayed  and  distinguished  as 
corresponding  to  the  body  (“B"),  the  cover  ("C"),  or  the  ligament  ("L"). 
On  the  right  side  of  the  figure,  trajectories  of  vocal  fold  fleshpoints  are 
shown  for  parameters  corresponding  to  normal  phonation. 
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Calculation  of  Empirical  Eigenfunctions 

First,  the  nodal  coordinates  Rt  were  separated  Soto 
mean  and  oscillatory  components: 

R/t)  +  rft).  /-1A....414  (3) 

where  the  bar  denotes  a  mean  value.  The  mean  represents 
the  dynamic  equilibrium  of  the  system,  and  the  remaining 
oscillatory  component  represents  the  time-varying  dis¬ 
placements  about  this  equilibrium.  A  covariance  matrix 
was  generated  using  the  time-varying  displacements: 

su  -  W*  iJ  =  l>2*  *414  (4) 

"  t-L 

where  tk  are  the  discrete  times  at  which  the  coordinates  are 
sampled,  and  N  is  the  total  number  of  time  samples.  The 
eigenvectors  of  the  covariance  matrix  correspond  to  the 
empirical  eigenfunctions.  At  any  time  tk,  the  nodal  dis¬ 
placements  may  be  expressed  as  a  linear  combination  of  the 
empirical  eigenfunctions  9.  (Deane  et  al„  1991): 

414 

ri(*k)  =  £«;(**)♦,<*').  t  =  1,2,... ,414  (5) 

j- 1 

where  4(i)  is  the  i-th  component  of  the  j-th  eigenfunction 
and  aft)  is  the  temporal  coefficient  of  the  j-th  eigenfunction 
at  time  rt.  The  temporal  coefficients  may  be  computed  by 
projecting  the  eigenfunctions  onto  the  time-varying 
displacements: 

414 

a, (»k)  -  £rt(,k}*7(i)*  y-lA--414  (6) 

t-1 

The  temporal  coefficients  themselves  may  be 
thought  of  as  temporal  eigenfunctions,  and  correspond  to 
the  eigenvectors  of  a  temporal  correlation  matrix  which 
may  also  be  generated  from  the  original  data  (Sirovicb, 
1987).  Both  the  spatial  and  temporal  eigenfunctions  are 
orthogonal,  and  reveal  distinct  features  of  the  dynamics  of 
the  system.  The  spatial  eigenfunctions  (sometimes  referred 
to  as  “topos”,  Aubry  et  al.,  1991)  reveal  topological  patterns 
in  the  data  and  are  analogous  to  the  normal  modes  of  linear 
systems.  The  temporal  eigenfunctions  (sometimes  referred 
to  as  “chronos”,  Aubry  et  al.,  1991)  reveal  information 
about  possible  entrainment  of  the  modes,  and  capture  the 
frequencies  at  which  the  modes  oscillate. 

Each  pair  of  spatio-temporal  eigenfunctions  has  a 
corresponding  eigenvalue,  which  quantifies  the  degree  to 
which  the  eigenfunctions  can  regenerate  the  nodal  trajecto¬ 
ries  (in  terms  of  variance).  Often  just  a  few  eigenfunction 


pairs  capture  the  essential  dynamics  of  a  system  (Deane  et 
aL.  1991),  which  farilitatesareductiou  of  the  system  as  well 
as  a  physical  interpretation  of  the  dynamics. 

All  the  covariance  matrices  calculated  in  this 
study  were  generated  with  one  second  of  stationary  output 
(5000 time  frames).  Initial  transients  and  other  noostationary 
segments  were  not  used  in  calculating  the  covariance 
matrices  and  resulting  empirical  eigenfunctions.  The  domi¬ 
nant  vibration  frequencies  of  these  modes  ranged  between 
80  and  160  Hz,  so  80  to  160  cycles  woe  used  in  calculating 
the  modes. 

Results 

Normal  PhooatloD 

First  of  all,  we  consider  the  results  of  the  analysis 
for  typical  parameters  corresponding  to  “normal”  pbona- 
tion.  The  normalized  eigenvalues  computed  for  this  simu¬ 
lation  are  shown  in  Table  1.  The  eigenvalues  are  shown  in 
descending  order.  The  far  right  column  shows  a  cumulative 
sum  of  the  eigenvalues.  From  this  table,  we  see  that  the  first 
eigenfunction  explains  about  72%  of  the  variance  of  the 
nodal  trajectories,  and  the  second  eigenfunction  about  26% 
of  the  variance.  Together  the  first  two  eigenfunctions 
explain  approximately  98%  of  the  variance,  suggesting  the 
dominance  of justafew  primary  modes.  These  results  were 
consistent  over  a  wide  range  of  elastic  constants  and 
subglottal  pressures. 


Table  1. 

Normalized  eigenvalue*  for  mode*  of  " normal"  phoaatiaa 
(Young's  Modulus  of  the  cover,  E,»2  kPa). 


Mode  Number 

M*> 

Cumulative  sum 
of  Xj  (%) 

1 

72.5 

72.5 

2 

25.2 

97.7 

3 

1.5 

99.2 

4 

0.5 

99.7 

A  coronal  view  of  the  first  eigenfunction  is  shown 
in  Fig.  4a.  Frames  1  and  2  display  maximum  and  minimum 
excursions  of  the  eigenfunction,  respectively  (solid  lines). 
The  dotted  lines  show  the  mean  coordinate  values.  By 
examining  the  motion  of  the  folds  near  the  top  of  the  glottal 
air  passage  (e.g.,  see  the  top  five  medial  nodes  which  are 
bolded  on  either  side),  one  can  note  the  correspondence  of 
this  eigenfunction  with  the  jc-11  mode.  That  is,  there  is  a 
higher  and  lower  portion  of  the  folds  which  are  1 80  degrees 
(Hit  of  phase.  Consequently,  this  eigenfunction  is  largely 
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responsible  for  alternately  shaping  a  divergent  (frame  1) 
and  convergent  (frame  2)  glottis.  In  addition,  there  is 
considerable  vertical  motion  similar  to  the  z-lOmode.  This 
coupling  of  x  and  z  modes  is  not  surprising  given  the 
incompressibility  of  the  tissue  (Tltzc,  1976).  Tissue  incom¬ 
pressibility  implies  that  the  overall  tissue  volume  does  not 
change,  so  if  the  folds  are  compressed  laterally,  they  must 
bulge  out  vertically,  and  vice  versa. 

A  coronal  view  of  the  second  eigenfunction  is 
shown  in  Fig.  4b.  Again  frames  land  2  show  maximum  and 
minimum  excursions  of  this  eigenfunction.  Again,  note 
that  near  the  top  ofthe  folds  (the  region  that  might  approxi¬ 
mate  a  rectangular  parallelepiped),  this  eigenfunction  is 
qualitatively  similar  to  the  x-10  mode  (Fig.  la),  and  is 
hugely  responsible  for  the  net  lateral  movement  ofthe  folds 
in  this  region. 

Fig.  5  shows  the  temporal  coefficients  associated 
with  each  of  these  eigenfunctions;  the  solid  line  displays  the 
temporal  coefficients  for  the  first  eigenfunction  and  the 
dotted  line  illustrates  the  time  coefficients  for  the  second 
eigenfunction.  The  temporal  coefficients  for  both 
eigenfunctions  are  nearly  sinusoidal  with  a  sine/cosine 
relationship  (model  lags  mode  2  by  about  90  degrees).  A 
simple  analysis  shows  that  the  modes  are  synchronized  in 
such  a  way  that  energy  transfer  may  occur  from  the  airflow 
to  the  tissue,  enabling  self-oscillation.  Specifically,  note 
that  the  solid  line  can  be  expressed  as  sin(t).  A  maximum 
in  this  line  occurs  for  a  divergent  glottis  (see  Fig.  4a,  frame 
1),  and  a  minimum  in  the  solid  line  occurs  for  a  convergent 
glottis  (see  Fig.  4a,  frame  2).  If  Bernoulli’s  law  is  taken  as 
approximately  valid,  then  the  intraglottal  pressure  will  be 
relatively  low  for  a  divergent  glottis  and  relatively  high  for 
a  convergent  glottis  (Titze,  1988).  As  a  first-order  approxi¬ 
mation,  one  might  say  that  the  intraglottal  pressure  is  in 
phase  with  -sin(t). 


Figure  4.  A  coronal  view  of  the  two  strongest  spatial  eigenfunctions  for 
normal  phonation.  The  first  eigenfunction  is  shown  in  (a,  top)  and  the 
second  eigenfunction  is  shown  in  (b,  bottom).  In  both  cases,  frame  l 
corresponds  to  a  maximum  excursion  of  the  eigenfunction,  and  frame  2 
corresponds  to  a  minimum  excursion. 


The  dotted  line  can  be  expressed  as  cos(t).  A 
maximum  in  this  line  occurs  when  the  folds  are  most  opened 
(Fig.  4b,  frame  1),  anda  minimum  occurs  when  the  folds  are 
closed  (Fig.  4b,  frame  2).  Because  this  mode  roughly 
corresponds  to  the  net  lateral  displacement  of  the  tissue,  a 
rough  estimate  of  the  net  lateral  velocity  of  the  tissue  is 
given  by  the  time  derivative  of  cos(t),  or  -sin(t).  Thus,  an 
examination  of  the  two  dominant  spatio-temporal 
eigenfunctions  of  the  biomechanical  simulation  reveals  an 
in-phase  relationship  between  the  intraglottal  pressure  and 
the  net  tissue  velocity,  which  allows  energy  transfer  from 
the  airflow  to  the  tissue. 

It  is  already  well-known  that  this  condition  must 
be  satisfied  if  self-oscillation  of  the  folds  is  to  occur  in  the 
presence  of  dissipation.  However,  the  important  point  is 
that  this  method  of  analysis  reduced  several  hundred  trajec- 
tories  of  the  biomechanical  simulation  to  essentially  two 
modes  of  vibration.  With  this  reduction,  the  dynamics  of  a 
biomechanical  model  with  many  degrees  of  freedom  could 
be  discussed  and  interpreted  as  easily  as  the  dynamics  of  a 
much  more  constrained,  low-order  model.  The  ability  to 
reduce  large  amounts  of  data  to  essential  dynamics  will  be 
crucial  for  understanding  more  complex  output  from  the 
biomechanical  simulation. 

As  a  word  or  caution,  it  should  be  noted  that 
because  the  biomechanical  simulation  was  reduced  to 
essentially  two  modes  of  vibration  for  parameters  corre¬ 
sponding  to  normal  phonation,  the  biomechanical  model 
was  in  no  way  reduced  to  a  two-mass  model  Even  for 
normal  phonation,  the  most  dominant  mode  of  the  biome¬ 
chanical  simulation  was  not  simply  a  lower-order  x-mode 
such  as  might  be  captured  by  a  two-mass  model,  but  an  x- 
mode  coupled  with  a  z-mode.  Furthermore,  although  the 
modes  of  a  two-mass  model  may  be  qualitatively  similar  to 


Figure  5.  The  two  dominant  temporal  eigenfunctions  for  parameters 
corresponding  to  normal  phonation.  The  solid  line  corresponds  to 
eigenfunction  1  and  the  dotted  line  corresponds  to  eigenfunction  2. 
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the  lower-order  mode*  of  a  simplified  elastic  continuum,  il 
is  questionable  whether  two  bar-shaped  masses  can  ad¬ 
equately  capture  the  smoothly  varying  shape  of  the  glottis. 
The  discontinuities  introduced  by  such  gross  spatial 
discretization  would  likely  have  an  adverse  effect  on  syn¬ 
thesis. 

Moreover,  the  biomechanical  simulation  has  hun- 
dreds  of  degrees  of  freedom  which  allow  itto  be  excited  in  to 
many  modes  of  vibration  not  possible  for  the  two-mass 
model.  The  firetthatjustafewofthe  lower-order  modes  are 
excited  for  a  range  of  parameters  corresponding  to  normal 
phonation  is  to  be  expected  and  mighteven  be  viewed  as  one 
validation  of  the  biomechanical  simulation.  For  other 
parameter  configurations,  additional  modes  are  excited  in 
the  simulation.  The  study  of  these  modes  may  yield 
additional  insights  into  vocal  fold  dynamics,  and  may  have 
relevance  for  an  understanding  of  voice  disorders. 

Chaotic  OscUattoM 

Because  the  vocal  folds  are  nonlinear  systems  with 
many  degrees  of  freedom,  bifurcations  and  chaos  should  be 
expected  for  certain  parameter  configurations  (Glass  & 
Mackey,  1988;Titzeetal.,  1993).  Indeed,  bifurcations  and 
chaos  appear  in  some  of  the  simplest  models  of  vocal  fold 
vibration  (Awrejcewicz,  1990;  Hazel  etal.,  1991).  They 
are  also  observed  in  more  complex  models  (Wong,  Ito,  Cox 
&  Titze,  1991),  in  models  which  incorporate  left-right 
asymmetries  (Ishizaka&Isshiki,  1976;  Wong  etal.,  1991; 
Smith,  Berke,  Gerrati  &  Kreiman,  1992),  and  most  recently 
in  finite  elementsimulations  of  the  folds  (Titze  etal.,  1993). 
Furthermore,  an  acoustical  analysis  of  many  types  of  rough 
voice  (e.g.,  creaky  voice,  vocal  fry,  vocal  disorders  and 
newborn  infant  cries)  reveals  an  intimate  relationship  be¬ 
tween  voice  mechanics  and  bifurcations  and  chaos  (Herzel 


Figure  6.  A  spectral  bifurcation  diagram  caEtii  siowiy  varied  from  0.35 
to  0.65  kPa,  in  increments  of  0.01  kPa  every  400  ms.  Transitions  from  a 
subharmenic  regime  to  chaos  to  periodic  motion  are  displayed. 


&  Wendler,  1991;  Herzel  et  al.,  in  review,  Mende  et  al.. 
1990). 

Previously,  it  has  been  suggested  that  the  origin  of 
chaos  in  vocal  fold  vibrations  may  be  the  desynchronization 
of  a  few  of  the  oscillatory  modes  (Titze  etal.,  1993;  Herzel 
et  al.,  in  review).  Specifically,  creaky  voice  or  vocal  Cry  is 
thought  to  be  induced  by  a  lax  cover,  which  could  lead  to 
the  desynchronization  of  a  few  of  the  low-order  x  and  z 
modes  (Herzel  etal.,  in  review). 

To  investigate  this  hypothesis,  the  lax  cover  was 
simulated  by  decreasing  the  transverse  Young’smodulusof 
the  cover  (EJ  in  the  biomechanical  simulation.  Starting  at 
2  kPa  (which  is  the  value  of  Ec  used  in  the  simulation  of 
“normal”  pbonation),  Et  was  gradually  lowered  and  the 
resulting  acoustic  output  was  observed.  No  unusual  behav¬ 
ior  was  noticed  until  Ea  reached  values  below  0.6  kPa,  at 
which  point  the  signal  became  irregular,  and  perceptually 
rough.  At0.4kPa,  the  signal  became  regular  again,  but  with 
a  doubling  of  the  original  period  (an  “octave  jump”),  which 
appeared  as  alternating  high  and  low  amplitudes  in  the 
acoustic  output  Such  phenomena  (e.g.,  irregular  oscilla¬ 
tions,  low  frequencies,  and  alternating  high/low  ampli¬ 
tudes)  are  characteristic  of  the  acoustic  output  of  creaky 
voice  (Hollien  &  Michel,  1968).  Listening  to  the  acoustic 
output  also  gave  the  perception  of  creaky  voice. 

A  spectral  bifurcation  diagram  (e.g.,  Lauterbom, 
1986)  is  shown  in  Fig.  6,  where  Ec  is  slowly  varied  from  0.35 
to0.65kPa.  From  left  to  right,  one  views  transitions  from 
a  subharmonic  regime  to  chaos  to  the  periodic  regime 
characteristic  of  normal  pbonation.  This  figure  shows 
striking  similarities  to  spectrograms  of  newborn  cries  (Mende 
et  al.,  1990)  and  to  acoustic  cavitation  experiments 
(Lauterbom,  1986).  A  more  complete  bifurcation  analysis 
of  this  region  will  be  treated  in  a  forthcoming  paper. 

For  the  present  investigation,  empirical 
eigenfunctions  were  determined  at  Ec  *  0.4  kPa  and  Ec=*0.5 
kPa.  Table  2  shows  the  eigenvalues  for  both  parameter 
configurations.  At  Ec  *  0.4  kPa,  four  eigenfunctions  are 
needed  to  describe  the  nodal  trajectories  in  as  much  detail 


TaMeZ 

Normalized  eigenvalue*  for  mode*  of  Ec  =  0.4  kP»  aad  0.5  kPa. 


E,  =  0.4  kPa 

E,  «  0.5  kPa  f 

Mode  Number 

K  (%) 

Cumulative  sum  \ 
of  A,  (%)  | 

j  1 

43.9 

43.9 

45.6 

45.6  I 

1  2 

30.9 

74.8 

27.0 

72.6  i| 

3 

16.1 

91.0 

12.5 

85.1 

4 

7.1 

98.1 

5.2 

90.3  1 

5 

0.7 

98.8 

3.1 

93.4  1 

6 

0.5 

99.3 

1.6 

95.0  j 

NCVS  Sum  and  Pragma  Report  •  36 


(in  terms  of  variance)  as  the  first  two  eigenfunctions  at 
E#  •  2  kPa  (see  Table  1).  At  E,  »  0.5  kPa,  additional 
eigenfunctions  are  needed.  However,  even  for  the  compli¬ 
cated.  nonperiodic  behavior  at  E, »  0 .5  kPa,  relatively  few 
eigenfunctions  are  needed  to  capture  most  of  the  variance 
of  vocal  fold  dynamics.  Out  of4 14  possible  eigen  functions, 
only  six  are  needed  to  describe  the  motion  in  considerable 
detail. 

Moreover,  the  first  three  spatial  eigenfunctions  at 
E,  ■  0.5  kPa  me  essentially  equivalent  to  the  first  three 
spatial  eigenfunctions  at  Et «  0.4  kPa  (approximately  90% 
agreement).  Table  3  shows  the  dot  product  of  the  first  three 
eigenfunctions  of  the  two  parameter  configurations.  Fur¬ 
thermore,  the  three  eigenfunctions  of  Ee  ■  0.5  kPa  also 
explain  over  90%  of  the  variance  for  higher  values  of  E, 
corresponding  to  normal  phonation.  Thus,  during  the  entire 
cascade,  three  low-order  “modes”  explain  most  of  the 
variance. 

The  fact  that  this  simulation  of  partial  differential 
equations  (PDE’s)  can  be  projected  onto  just  a  few 
eigenfunctions  is  reminiscent  of  the  findings  of  Saltzman 
(1962)  and  Lorenz  (1963)  in  relation  to  Bernard  convection. 
In  their  studies,  it  was  found  that  close  to  the  onset  of 
convection  there  were  only  a  few  dominant  modes,  which 
led  to  the  derivation  of  the  celebrated  Lorenz  equations 
(1963).  While  Lorenz  employed  a  trigonometric  expan¬ 


sion,  in  this  study  empirical  eigenfunctions  might  be  appro¬ 
priate  to  reduce  the  original  PDE*  s  to  a  small  set  of  ordinary 
differential  equations  (ODE’s).  Such  reductions  may  be 
useful  for  the  design  of  lower-order  models  (which  can 
nevertheless  simulate  various  vocal  qualities),  and  may 
help  facilitate  bifurcation  analyses  over  specific  parameter 
regions  of  the  model  (Deane  et  al.,  1991). 

The  essential  difference  in  the  system  at  Ee »  0.4 
kPa  and  Ee  ■  0.5  kPa  is  revealed  by  the  temporal 
eigenfunctions,  as  illustrated  in  Fig.  7.  The  temporal 
eigenfunctions  ofEt »  0.4  kPa  are  nearly  periodic,  while  the 
temporal  eigenfunctions  of  E(  *  0.5  ldPa  are  nonperiodic. 


TaMa  3. 

Dot  product  oT  fine  three  mode*  of  Ec*  0.4  kPaaadOS  kPa 
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Phase  portrait  reconstructions  of  the  attractors  for  Ee »  0.4 
kPa  sod  E( »  0.5  kPa  were  generated  using  the  first  three 
temporal  eigenfunctions  of  each  configuration.  The  tem¬ 
poral  eigenfunctions  of  Et  *  0.4  kPa  results  in  the  attractor 
of  Hf.  8a,  which  shows  weakly -modulated  periodic  mo¬ 
tion.  The  nonperiodic  temporal  eigenfunctions  of  Et »  0-5 
kPa  yield  the  chaotic  attractor  shown  in  Fig.  8b.  These 
results  lend  support  to  our  earlier  claims  that  the  origin  of 
chaos  in  vocal  fold  vibrations  is  the  desynchronization  of 
a  few  of  the  low-order  modes. 

The  first  three  spatial  eigenfunctions  of  Ec  =  0.5 
kPa  are  shown  in  Fig.  9.  In  this  case,  it  may  not  be  possible 
to  claim  a  definite  relationship  between  the  empirical 
eigenfunctions  and  the  normal  modes  of  the  simplified 
folds.  Indeed,  many  factors  (e.g.,  tissue  incompressibility, 
complex  geometry,  nonlinearities)  may  cause  significant 
deformations  in  the  modes  of  vibrations.  Nevertheless,  the 


Eigenfunction  3 

Figure  8.  Phase  portrait  reconstructions  of  attractors  for  £,  »  0.4  kPa  (a, 
top)emdE'=0.5kPa(b,  bottom)  using  the  first  three  temporal  eigenfunctions 
from  each.  Weakly-modulated  periodic  motion  is  shown  in  (a)  and  a 
chaotic  attractor  in  (by. 


eigenfunctions  appear  to  be  manifestations  of  simple,  k>w- 
order  modes.  For  example,  the  first  eigenfunction  (Fig.  9a) 
shows  some  resemblance  to  a  z-10  mode,  the  secood 
eigenfunction  to  an  x-11  mode,  and  the  third  eigenfunction 
to  an  x-10  mode. 

In  addition,  the  sixth  eigenfunction  for  Et»0.5kPa 
is  analogous  to  a  higher-order  normal  mode  (e.g.,  the  x-21 
mode).  Although  not  as  commonly  observed,  these  higher- 
order  modes  have  been  viewed  occasionally  with  high¬ 
speed  cinematography  (Rubin  &  Hire,  1960).  Fig.  10  shows 
a  superior  view  of  this  eigenfunction.  This  eigenfimetion 
did  not  appear,  at  least  as  clearly,  in  the  more  stable 
oscillations  corresponding  to  Es »  0.4  kPa  and  Et  ■  2  kPa. 
This  may  be  related  to  the  fact  that  this  is  an  unstable 
eigenfunction,  and  is  thus  usually  only  excited  during  more 
unstable,  nonperiodic  vibrations.  Even  in  the  complex 
oscillations  from  which  this  eigenfunction  was  extracted, 
the  higher-order  eigenfunction  was  so  weak  that  it  could  not 
be  visually  detected  in  the  overall  vibration  pattern. 


Discussion 

As  is  well-known  from  high-speed  films, 
stroboscopy,  and  sophisticated  models,  vocal  fold  vibra¬ 
tions  exhibit  complex  three-dimensional  patterns.  How¬ 
ever,  normal  pbonadon  produces  fairly  periodic  acoustic 
output  These  observations  may  be  explained  by  the  fact 
that  only  a  few  modes  are  excited  and  all  the  modes  are 
entrained.  This  concept  has  been  substantiated  through 
examining  the  empirical  eigenfunctionsof  a  biomechanical 
simulation  of  vocal  fold  vibrations  during  self-oscillation. 
Even  though  hundreds  of  degrees  of  freedom  exist,  two 
eigenfunctions  explain  98%  of  the  variance  of  the  nodal 
trajectories.  By  viewing  the  high-order  simulation  as  a 
superposition  of  just  two  dominant  eigenfunctions,  an 
interpretation  of  the  mechanism  of  self-oscillation  of  the 
folds  is  facilitated.  These  eigenfunctions  are  qualitatively 
similar  to  x-11  and  x-lOmodes  (although  the  x-11  mode  is 
also  coupled  with  a  z-10  mode  which  is  partially  a  result  of 
tissue  incompressibility). 

The  technique  of  empirical  eigenfunctions  is  also 
useful  for  describing  irregular  oscillations  related  to  rough 
voice.  Changing  parameters  of  the  biomechanical  model 
leads  to  subharmonic  regimes  and  chaos.  However,  despite 
complex  modem,  a  relatively  small  number  of  eigenfunctions 


Figure  9.  Coronal  views  af  the  first  three  spatial  eigenfunctions  at  Ecn  0.5  kPa.  Eigenfunction  1  shows  some  resanblance  to  a  z-10  mode,  eigenfunction 
2  tom  x-11  mode,  and  eigenfunction  3  to  an  x-10  mode.  Coronal  views  are  shown  as  in  Figure  4. 
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a  ttrit*  of  5  ttfmmtiul  whiten  art  tkc*m  from  Itfl  to  right. 

capture*  the  essential  dynamics  of  the  folds.  In  addition, 
some  of  the  more  subtle  dynamics  captured  by  “weaker'’ 
eigenfunctions  correspond  with  higher-order  normal  modes. 

Although  the  oscillation  pattern  in  the  simulation 
changed  substantially  with  decreasing  stiffness  of  the  cover, 
the  spatial  “modes"  remained  more  or  less  the  same. 
Consequently,  it  is  shown  that  the  appearance  of  chaos  in 
vocal  fold  oscillations  may  be  understood  in  terms  of  a 
desynchronization  of  a  few  of  the  low-order  “modes". 

Furthermore,  such  eigenfunctions  may  be  useful 
for  designing  lower-order  models  capable  of  simulating 
specific  vocal  disorders,  and  for  performing  bifurcation 
analyses.  The  technique  can  also  be  useful  for  evaluating 
and  refining  biomechanical  simulations  of  the  folds,  and  for 
assessing  the  impact  of  various  design  parameters  on  modal 
shapes.  In  particular,  one  might  start  from  the  conditions  of 
the  simplified  folds  for  which  analytic  solutions  of  the 
normal  modes  are  known  (Titze& Strong,  1975).  Then  the 
deformation  of  the  normal  modes  caused  by  nonlinearities, 
complex  geometry,  and  tissue  incompressibility  may  be 
observed  systematically  and  independently  as  one  com¬ 
plexity  at  a  time  is  added  to  the  system.  Toe  technique  can 
also  be  used  to  determine  eigenfunctions  from  empirical 
data  obtained  directly  from  vocal  fold  tissues,  provided 
such  data  can  be  obtained.  In  general,  the  method  of 
empirical  eigenfunctions  enhances  the  study  of  vocal  fold 
dynamics  by  allowing  the  principal  modes  of  vibration  to  be 
extracted  during  self-oscillation,  despite  inherent 
noolinearities. 
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Abstract 

The  relation  between  subglottic  pressure  and  fun¬ 
damental  frequency  of  vocal  fold  vibration  was  studied  by 
means  of  evoked  pbonation  in  an  in  vivo  canine  model.  The 
evoked-pbonadon  model  involved  electrical  stimulation  of 
the  midbrain  that  resulted  in  consistent  responses  by  respi¬ 
ratory  and  laryngeal  musculature  accompanied  by  pbona¬ 
tion.  The  dynamic  stiffness  properties  of  the  vocal  folds, 
especially  the  “cover”  woe  investigated  by  delivering 
various  amounts  of  air  pressure  to  the  larynx  from  an 
opening  in  the  trachea.  Fundamental  frequency  of  vocal 
fold  vibration  increased  linearly  with  subglottic  pressure. 
The  slopes  ranged  from  22.4  to  1 18.7  Hz/kPa  in  7  animals. 
The  results  indicated  that  the  dependence  of  fundamental 
frequency  on  subglottic  pressure  is  a  passive  mechanical 
phenomenon. 

Introduction 

The  vocal  foldhasaunique multilayered  structure. 
Owing  to  this  characteristic  structure,  the  concept  of  a 
cover-body  complex  has  become  a  prevalent  hypothesis  in 


explaining  vocal  fold  function '.  Modes  of  vocal  fold 
vibration  are  determined  by  a  combination  of  the  mechani¬ 
cal  characteristics  of  the  cover  and  the  body  of  the  vocal 
folds1.  In  canines,  vocal  fold  oscillation  during  pbonation 
is  confined  primarily  to  the  cover  of  the  vocal  folds ,A4.  The 
cover  includes  a  surface  layer  of  mucus,  the  stratified 
squamous  epithelium,  and  Reinke’sspace  also  known  as  the 
superficial  layer  of  lamina  propria. 

For  fundamental  frequency  (F^  control,  the  coor¬ 
dinated  action  of  the  cricothyroid  and  thyroarytenoid 
muscles,  as  well  as  other  intrinsic  and  extrinsic  laryngeal 
muscles,  results  in  the  effective  stiffness  of  the  cover  and 
body  of  the  vocal  folds.  In  addition,  subglottic  pressure  (P) 
influences  the  F#  of  vocal  fold  vibration  to  some  degree54. 
The  influence  of  P(  on  Ft  can  be  included  in  the  effective 
stiffness  of  the  vocal  folds9.  As  Pt  increases,  the  amplitude 
of  lateral  excursions  of  the  vocal  folds  will  increase; 
consequently,  vocal  fold  tension  increases  dynamically. 
Theoretically,  if  the  coordinated  actions  of  the  intrinsic 
laryngeal  muscles  can  be  held  constant,  the  relation 
between  P(  and  the  F#  of  vocal  fold  vibration  should  be 
linear9. 
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In  this  experiment,  we  used  an  in  vivo  canine 
model  of  evoked  pbonation  by  electrical  stimulation  of  the 
midbrain.  Such  stimulation  elicits  activation  of  laryngeal 
and  respiratory  muscles  that  results  in  a  phonatory 
response  The  response  is  repeatable  and  robust  over 
many  trials  of  stimulation.  Known  air  pressures  were 
applied  to  the  larynx  through  an  opening  in  the  trachea. 
Under  these  conditions,  the  relation  between  P§  and  F0  was 
investigated  in  order  to  determine  whether  the  results  of 
previous  studies  of  the  cover- body  hypothesis,  conducted  in 
the  absence  of  active  muscle  tensions,  would  be  the  same  as 
the  results  obtained  with  naturally  coordinated  activity  of 
the  laryngeal  muscles. 

Method 

The  study  was  conducted  with  7  bound- like  mon¬ 
grel  dogs,  weighing  approximately  20  kg.  The  animal* 
were  anesthetized  to  surgical  levels  (absence  of  corneal  and 
deep  pain  reflexes)  with  pentobarbital  (IV,  25mg/kg).  The 
adequacy  of  the  anesthesia  was  checked  frequently,  and 
additional  pentobarbital  was  given  as  needed.  At  the 
conclusion  of  the  experiment,  animals  were  euthanized 
with  a  large  dose  of  pentobarbital. 

The  animals  were  positioned  for  the  evoked  pho- 
nation  experiment  in  a  stereotaxic  apparatus.  A  schematic 
representation  of  the  stereotaxic  frame  and  electrode  car¬ 
rier,  with  representation  of  the  various  measurements  that 
were  made,  is  provided  in  Fig.  1.  The  animal’s  bead  was 
fixed  by  using  earbars  inserted  firmly  in  the  external 
auditory  mead,  infraorbital  ridge  holders  which  were  in  the 
plane  of  the  earbars,  and  a  maxillary  brace.  The  cortex  was 
exposed  by  cutting  a  1.S  cm  diameter  hole  through  the 
parietal  bone  and  lacerating  the  underlying  dura;  this 
opening  was  centered  10  mm  anterior  and  5  nun  lateral  to 
ear  bar  zero.  At  these  stereotaxic  coordinates,  a  coaxial 
bipolar  electrode  (Rhodes  NE-100)  was  inserted  vertically 
into  the  brain  to  an  initial  depth  of  20  cm  dorsal  to  ear  bar 
zero. 


Figure  1.  Schematic  representation  of  the  experimental  set-up  used. 


Before  beginning  exploration  of  the  midbrain  with 
electrical  stimulation  for  the  purpose  of  eliciting  phonation. 
the  animal  was  rotated  from  a  prone  to  a  supine  position  to 
allow  access  to  the  ventral  neck.  This  is  because  rotation  of 
the  animal  can  produce  a  change  in  the  position  of  the  brain, 
and  previously  established  responses  to  stimulation  may  be 
lost. 

A  midline  skin  incision  of  the  ventral  neck  was 
made  and  the  larynx  and  trachea  were  identified.  Bipolar 
stainless  steel  booked-wire  electrodes  (75  um  diameter) 
were  inserted  into  cricothyroid  (CT),  thyroarytenoid  (TA), 
lateral  cricoarytenoid  (LCA),  and  posterior  cricoarytenoid 
(PCA)  muscles  for  electromyographic  (EMG)  recording. 
The  locations  of  the  electrodes  were  verified  by  dissecting 
the  related  muscles  after  the  experiment  Two  low-pressure 
cuffed  cannulac  were  inserted  in  the  trachea,  one  between 
the  third  and  fourth  and  one  between  the  sixth  and  seventh 
tracheal  rings.  The  more  caudally  located  cannula  was  used 
for  the  animal’s  natural  respiration.  The  more  rostral 
cannula  was  connected  to  an  air  source  that  supplied 
wanned  and  humidified  air  to  the  larynx  (Concha-Therm 
IT).  The  pressure  in  the  trachea,  controlled  with  a  pressure 
regulating  valve  (Fairchild  Model  10),  was  measured  with 
a  pressure  transducer  (Micro  Switch  143PC03G). 

With  the  animal  remaining  in  a  supine  position, 
electrical  stimulation  to  sites  within  the  midbrain  was 
delivered  until  vocalization  was  elicited.  Cites  from  5  to  20 
nun  dorsal  to  ear  bar  zero,  at  1  mm  intervals,  were  stimu¬ 
lated.  If  acceptable  pbonation  was  not  obtained  with 
stimulation  at  any  depth  with  the  initial  anterior-posterior 
and  medial-lateral  coordinates,  the  electrode  was  with¬ 
drawn  and  additional  vertical  tracks  were  made  at  1  mm 
deviations  from  previously  attempted  locations.  Electrical 
stimuli  consisted  of  a  2-3  s  train  of  02  ms  pulses  at  a  rate 
of 200  Hz.  An  electrode  placement  was  considered  accept¬ 
able  for  the  experiment  if  low-pitched  pbonation  with 
minimal  cricothyroid  muscle  activity  (Fig.  2)  was  elicited 
consistently  at  low  current  levels  (0.3  •  05  ma).  Once 
selected,  the  site  for  electrical  stimulation  of  the  midbrain 
was  constant  throughout  the  experiment 

With  concurrent  applications  of  electrical  stimu¬ 
lation  to  the  midbrain  ami  air  pressure  to  the  larynx, 
phonation  could  be  sustained.  During  the  evoked  response 
it  was  necessary  to  occlude  partially  the  outlet  of  the 
tracheal  cannula  connected  to  the  lungs.  If  left  open,  the 
lack  of  the  respiratory  resistance  ordinarily  provided  by  the 
closed  glottis  caused  the  lungs  to  deflate  rapidly,  producing 
a  series  of  rapid  brief  phonations  associated  with  interrup¬ 
tions  of  the  activity  of  the  laryngeal  muscles.  This  effect  has 
been  described  previously  M'l}  and  is  known  to  be  a  reflex 
mediated  by  afferents  in  the  vagus  nerve.  The  air  pressure 
delivered  to  the  larynx  (P)  during  evoked  phonation  ranged 
from  the  lowest  to  the  highest  pressures  that  could  sustain 
phonation  in  the  modal,  or  chest,  register  (about  0.3  to  4 
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Figure  l  A  typical  reppon**  to  a  2^  electrical  sliemlntimf  bottom  trace) 
from  fomr  mtrineic  laryngeal  muscles  (thyroarytenoid  (TA),  lateral 
cricoarytenoid  (LCA)  posterior  cricoarytenoid  (PCA),  and  cricothyroid 
(CD),  subglottic  praetor*  (provided  by  the  dog  ’teem  bmgt) L  md  voice  bt 
dog  05  it  illustrated.  The  electrode  placement  in  the  midbram  was  1 1  mm 
tmterior,  5  mm  lateral,  and  14  mm  dorsal  toolbar  zero.  Note  OuU  TA  tmd 
LCA  activity  ispresent and  PCA  and  CT activity  isabeentduringpkanatwu. 
The  bar  at  the  bottom  of  the  figure  represents  300 me.  During  this  urn*, 
mean  Pt  is  1.2  kPa.  ami  mean  Ft  is  133  Hz, 


kPa).  Pressure  was  increased  in  increments  of  about  O.lkPa 
in  an  ascending  seriesof  phonations,  and  then,  afteraperiod 
of  rest,  a  descending  series  was  recorded.  The  P(  was 
approximately  constant  during  each  midbrain  stimulation. 
Analysis  was,  however,  based  upon  the  directly  recorded 
pressure  in  the  trachea.  The  voice  was  recorded  by  an 
electret  microphone  placed  just  outside  the  animal’ s  mouth. 

Data  were  recorded  on  an  8-channel  FM  data  tape 
recorder  (Hewlett  Packard  3968 A,  bandpass  DC-2.51tHz). 
Recorded  signals  were  a  midbrain-stimulaiioa  marker, 
EMG  Grom  laryngeal  muscles  (including  TA,  LCA,  PC  A, 
CT),  subglottic  pressure,  and  voice.  The  recorded  data  were 
subsequently  processed  with  an  IBM  PC  data  acquisition 
and  processing  program  (Codas  DATAQ).  F0  was  calcu¬ 
lated  Grom  the  voice  signal  using  an  FFT  analysis  function. 
Mean  Pt  was  obtained  by  averaging  the  pressure  signal 
during  phonation  over  a  period  of  approximately  0.S  s, 
beginning  about  0.5  s  after  the  onset  of  phonation.  At  least 
16  simultaneous  measures  of  Pt  and  F0  were  obtained  from 
each  animal.  The  data  pairs  were  plotted  and  analyzed 
statistically  using  a  graphics  and  analysis  program  for  the 
PC  (Sigmaplot  4.0). 

Results 

Consistent  low-pitched  phonation  was  elicited 
Grom  each  animal  during  midbrain  stimulation.  Laryngeal 
activity  during  phonation  was  characterized  by  increased 
EMG  activity  of  the  TA  and  LCA  muscles  and  suppression 
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Figure  3.  Subglottic  pressure  (PJ  it  applied  to  the  larynx  at  a  lout  level  (A: 
PM  u  0JkPa)amd  a  high  level  (B:  Pt  *  2.7  kPa)  during  midbram  stimulation. 
Thyroarytenoid  (TA)  and  lateral  cricoarytenoid  (LCA)  activity  appear 
similar  for  the  two  condition*.  Voice  fundamental frequency  (FJ  is  109  Hz 
in  A  and  186  Hz  in  B.  The  time  bar  is  3  me. 


of  CT  and  PCA  muscle  activity  (Fig.  2).  The  level  of 
activation  of  the  TA  and  LCA  remained  the  same  when 
large  differences  of  pressure  were  delivered  to  the  larynx 
(Fig.  3). 

The  F#-P§  data  pairs  for  3  animals  (Numbers  2, 4, 
and  7)  are  plotted  in  Figure  4.  The  solid  lines  represent  the 
linear  regressions  for  each  data  set  For  these  3  animals  and 
the  other  4  animals  (see  Table  1),  the  data  points  fell  close 
to  their  respective  regression  lines.  This  result  is  reflected 
in  the  R2  values,  which  were  close  to  1  for  all  of  the 
experiments  (Table  1).  Thus,  F0  was  demonstrated  to  be 
linearly  related  to  P(. 

The  change  in  Fa  with  P(  (the  slope  of  the  regres¬ 
sion  lines)  ranged  Grom  22.4  Hz/kPa  ( or  2.3  Hz/cm  HzO)  to 
118.7  Hz/kPa  (or  12.1  Hz/cm  H,0).  The  extreme  values  for 
slope  are  illustrated  in  Fig  4  (dogs  2  and  7),  along  with  one 
having  an  intermediate  value  (dog  4).  P(  across  all  experi¬ 
ments  ranged  from  0.32  kPa  to  4.08  kPa.  The  data  from  the 
experiments  with  the  smallest  and  the  largest  ranges  of  P( 
are  include  in  Figure  4  (dogs  2  and  4  respectively).  F0  for 
all  elicited  phonations  ranged  from  81  Hz  to  235  Hz,  and 
were  perceived  to  be  in  modal  register. 
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Figure  4.  Vote*  flmdmtmtai  flag* mey  it  platted  esgomst  subglottic 
pressure  for  dog*  2  (opem  triumgi**),  4  (closed  triangles),  eat*  7  (open 
circles).  The  soUd  limes  an  regression  lutes  for  each  data  set. 


Discussion 

It  is  well  documented  that  vocal  fold  vibration  is 
a  mechanical  phenomenon14.  The  mechanical  parameters 
include  subglottic  aerodynamic  power  and  the  mechanical 
characteristics  of  the  vocal  folds’  cover  and  body.  The 
purpose  of  the  present  experiment  was  to  examine  the 
mechanical  characteristics  of  the  vocal  fold  cover  during 
phonation  with  normal  active  muscle  tensions.  A  modified 
in  vivo  model  was  used  in  which  the  midbraiii  of  anesthe¬ 
tized  dogs  was  stimulated  electrically  to  evoke  phonation ,#. 
Consistent  pbonations  with  similar  patterns  of  muscular 
activity  (increased  activity  of  the  TA  and  LCA  muscles  and 
decreased  activity  of  the  CT  and  PCA  muscles)  were 
elicited  over  many  repetitions.  By  using  repeated  evoked 
phonation,  minor  differences  in  the  patterns  of  intrinsic 
laryngeal  muscle  activity  could  be  disregarded  and  we 
could  examine  various  aspects  of  the  mechanical  character¬ 
istics  of  phonation. 

From  a  model  advanced  by  Titze  *,  it  can  be 
inferred  that  F#  increases  linearly  with  Pt.  The  mechanism 
for  this  relationship  is  that  an  increase  in  P,  will  increase  the 
lateral  excursion  of  vocal  fold  displacement  (the  amplitude 
of  the  vibration).  Because  the  amplitude  to  length  ratio  is 
not  small,  vocal  fold  tension  will  increase  dynamically, 
resulting  in  a  rise  in  F0  with  Pt.  The  present  results 
confirmed  this  inference.  The  data  illustrated  a  clear  linear 
relationship  between  F#  and  P,.  Thus,  the  mechanical  nature 
of  vocal  fold  vibration  was  supported. 

Similar  results  have  been  reported  by  Moore  and 
Berke*.  They  used  an  in  vivo  canine  model  with  recurrent 
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laryngeal  nerve  stimulation  to  elicit  vocal  fold  adduction, 
and  also  found  that  F0  increased  linearly  with  P§.  Interest¬ 
ingly,  the  range  of  Pt  for  which  this  linear  relationship  was 
found  ranged  from  approximately  3  to  4  kPa  (3  to  40  cm 
H,0)  with  midbrain  stimulation  in  the  present  study  and  4  to 
12  kPa  (40  to  120  cm  H^O)  with  recurrent  laryngeal  nerve 
stimulation  by  Moore  and  Berke.  Therefore,  it  appears  that 
this  linear  relationship  can  be  generalized  to  a  wide  range 
of  pressures. 

The  degree  to  which  F0  changed  with  P,  across 
animals  was  quite  variable,  ranging  from  22.4  to  118.7  Hz/ 
kPa.  This  variability  has  been  attributed  to  different  prc- 
phonatory  vocal  fold  length  conditions*.  Itwas  not  possible 
to  directly  evaluate  Titze’s  hypothesis  about  the  effect  of 
pre-phonatory  vocal  fold  length  in  the  present  experiment 
because  the  direct  view  of  the  vocal  folds  was  not  adequate. 
This  is  an  obvious  area  in  which  the  experiment  may  be 
improved. 
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Abstract 

Acoustic  analysis  of  voice  sometimes  requites  a 
constant  mouth  to  microphone  distance.  In  this  study  a 
miniature  head  mount  condenser  microphone  was  com¬ 
pared  to  a  larger,  professional  grade  condenser  microphone 
typically  mounted  on  a  stand.  Long  term  and  short  term 
amplitude  and  frequency  perturbation  measures  of  human 
pbonation  were  made  for  comparison.  The  results  indicate 
that  for  this  type  of  analysis  only  small  differences  exist 
between  the  two  microphones.  This  suggests  that  errors 
associated  with  variable  source  to  microphone  distance  can 
be  reduced  without  losing  baseline  quality  in  transducing 
voice  signals  for  analysis. 

Introduction 

In  a  previous  study  of  the  effects  of  microphone 
type  and  placement  on  voice  perturbation  measures  (Titze 
andWinholtz,  in  press),  it  was  found  amouth  to  microphone 
distance  of  a  few  centimeters  was  desirable  for  high  preci¬ 
sion  perturbation  analysis.  At  close  distances,  however,  a 
small  change  in  mouth  orientation  can  cause  a  significant 
change  in  amplitude  aid  phase  of  the  microphone  output 
If  small  children  or  patients  with  bead  tremor  (or  other 
uncontrollable  movements)  are  subjects  for  recording, 
meaningful  data  collection  and  analysis  may  be  difficult 

To  minimize  error  dire  to  motion  artifact  the 
obvious  solution  is  to  use  a  head  mounted  microphone  that 
maintains  a  constant  distance  of  afew  centimeters  from  the 
mouth.  However,  the  aerodynamic  artifacts  for  near-to- 
moutb  diameters  have  not  been  documented  well  for  many 
vocal  tasks.  The  solution  proposed  here  may  therefore  not 
be  applicable  for  all  vocal  tasks. 


This  study  will  focus  on  comparison  between  a 
miniature  head  mounted  microphone  and  a  larger  high 
quality  microphone  mounted  on  a  stand.  Amplitude  and 
frequency  perturbation  measures  on  human  sustained  vowel 
pbonation  will  be  used  in  the  comparison. 

Description 

The  two  microphones  chosen  for  this  study  were 
(1)  an  AKG  C410  (Fig.  la  -  next  page)  miniature  caidioid 
condenser  bead  mount  and  (2)  an  AKG  C451EB  (CK-22 
capsule)  (Fig.  lb  -  next  page)  standard  mount  omnidirec¬ 
tional  condenser.  Figure  2  (next  page)  shows  the  manufac¬ 
turers  published  frequency  response  curves  for  the  two 
microphones  (Fig.  la  for  the  410  and  Fig.  lb  for  the  4S1). 
It  is  apparent  from  the  curves  that  the  low  frequency 
response  of  the  410  begins  to  rolloff  near  150Hz,  whereas 
no  roll -off  is  seem  for  the  451.  Sensitivity  was  measured  by 
placing  the  microphones  4cm  in  front  of  a  loud  speaker 
producing  a  200Hz  sinewave  at  80dB  sound  pressure  level 
(SPL)andsubtractingapreampliftergainof60dB.  The410 
had  a  sensitivity  of  -58.6dB  and  the  45 1  had  a  sensitivity  of 
-49.6dB,  indicating  a  9dB  greater  sensitivity  for  the  451. 

Method 

Twenty  normal  subjects,  10  male  and  10  female, 
were  asked  to  sustain  the  vowel  [a]  at  two  pitches,  low 
(100Hz  for  males,  200Hz  for  females)  and  medium  (200Hz 
for  males,  400Hz  for  females)  at  a  comfortable  loudness 
level.  The  subjects  were  seated  in  a  chair  with  a  headrest 
and  instructed  to  keep  their  head  position  constant.  To  test 
the  low  end  of  the  fundamental  frequency  range,  two  of  the 
male  subjects  were  also  asked  to  pbonate  at  75Hz,  as  well 
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Figure  1.  Diagram  of  the  microphone *  (a)  AKG  C410  miniature  head 
mount  microphone,  (b)  AKG  C4S1 EB  standard  mount  microphone. 

as  to  produce  vibrato  at  the  low  and  medium  pitches.  In 
addition,  one  male  subject  was  asked  to  sustain  the  vowel 
while  moving  his  head  in  one  of  two  manners:  drift,  aslow 
(.5Hz)  movement  from  the  axis  of  the  mouth  to  approxi¬ 
mately  30*  off  axis;  and  wobble,  a  slightly  faster  (1Hz) 
periodic  movement  from  0*  to  30*  on  either  side  of  the 
mouth  axis.  This  task  was  performed  for  low  and  medium 
Fo.  Drift  was  to  approximate  what  a  normal  subject  might 
do  (particularly  a  child),  and  wobble  was  to  approximate 
what  a  subject  with  a  head  tremor  might  do. 

The  experiment  was  conducted  in  an  IAC  isolation 
booth,  3.2m  deep  by  3 .5m  wide  by2.4mhigh.  AmbientSPL 
of  the  booth,  measured  with  a  B&K  2230  SPL  meter  on  the 


0») 

Figure 2.  Fregueney  revenue  curvet  for  (a)  the  410  microphone,  (b)ihe 
451  microphone. 


linear  weighing  scale  (20  Hz  to  20  KHz),  was  53dB  near  the 
center  of  the  booth.  The  microphones  were  positioned  at  a 
4cm  distance,  one  at  +90*  to  the  mouth  axis  and  one  at  -90* 
to  the  mouth  axis.  The  two  microphone  signals  were 
preamplified  (ATI  M-1000)  and  recorded  simultaneously 
with  a  Panasonic  SV-3700 DAT  recorder.  Later  the  signals 
were  played  out  of  the  DAT  recorder,  high  pass  filtered  at 
60  Hz  (24dB/oct),  amplified  by  Tektronix  AM502  amplifi¬ 
ers  and  sampled  by  a  D  SC-200 16bit  A/D  converter  at  2QK 
samples/second.  A  two  second  segment  near  the  middle  of 
each  subjects  phonation  was  digitized. 

A  least  squares  waveform  matching  algorithm 
(Milenkovic,  1987;  Titze  and  Liang,  in  press)  was  used  to 
analyze  the  two  second  segments  of  the  digitized  signals. 
This  algorithm  provides  the  highest  precision  in  F#  extrac¬ 
tion  for  small  perturbations  (Titze  and  Liang,  in  press).  The 
perturbation  measures  included  CV  (coefficient  of  varia¬ 
tion,  applicable  to  slow  deviations  from  the  mean  amplitude 
andFjandPFl  (the  mean  rectified  value  of  the  first  order 
perturbation  function,  applicable  to  more  rapid  cycle-to- 
cycle  variations).  PF1  is  commonly  called  jitter  when 
applied  to  an  Ft  contour  and  shimmer  when  applied  to  an 
amplitude  contour.  A  correlation  of  these  measures  for  the 
different  vocal  tasks  was  used  to  compare  the  microphones. 

Results 

Since  the  recordings  with  the  two  microphones 
were  simultaneous,  individual  tokens  could  be  compared 
with  a  scatter  plot  This  is  shown  in  Figure  3  for  the 
amplitude  perturbation  measures  and  Figure  4  for  the 
frequency  perturbation  measures.  Tokens  for  the  twenty 
normal  subjects  are  represented.  The  X  axis  represents 
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Figure  3  (lap).  Scatter  plot  of microphone  1  (the  410)  venue  microphone 
2  ( the  451)  for  amplitude  perturbation  analysis  (CV A  tmd  PFA  )  using  ten 
male  subjects  and  ten female  subjects  at  two  Fj.  Figure 4  (bottom).  Scatter 
plot  similar  to  Figure  3  for  frequency  perturbation  measure*. 


measures  for  microphone  1  and  the  Y  axis  for  microphone 

2. 

The  plots  indicate  that,  for  the  type  of  amplitude 
and  frequency  perturbation  analysis  used  here,  the  micro¬ 
phones  gave  similar  results  over  two  octaves  of  Fo.  AI- 
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though  the  points  do  not  all  fall  on  the  diagonal,  an 
inconsistency  is  not  apparent  between  the  microphones. 
One  would  expect  to  see  a  trend  of  one  microphone 
producing  consistently  lower  or  higher  numbers  than  the 
other,  particularly  in  the  amplitude  measures,  if  a  major 
discrepancy  existed  between  the  microphones. 

Table  1  shows  results  aimed  at  testing  low  F#,  low 
frequency  modulation  in  the  voice,  and  a  moving  source. 
With  the  higher  rolloff  of  the  410  frequency  response, 
perturbation  measures  from  phonation  with  F(s  below  this 
rolloff  would  be  expected  to  produce  higher  measures  for 
the  410,  especially  in  the  case  of  low  F(  with  vibrato.  The 
data  did  not  indicate  a  compelling  difference.  However, 
there  were  several  interesting  observations  with  the  head 
motion  task.  As  expected,  the  long  term  amplitude  mea¬ 
sures  (CVA)  were  affected  the  most  For  subject  MI2.CV A 
measures  for  wobble  were  two  to  three  times  higher  with  the 
stationary  microphone  than  with  the  constant  distance 
microphone.  There  were  no  strong  effects  for  fundamental 
frequency  measures,  however. 

Table  2  (following  page)  presents  the  correlation 
data  for  the  various  vocal  tasks.  Over  two  octaves  of  Fo  the 
correlations  were  all  greater  than  0.8S  when  no  motion 
artifacts  were  present  Most  of  the  correlations  were  above 
0.95.  No  statistical  difference  was  found  between  the 
microphones  at  the  p<.  05  level.  For  the  bead  motion  tasks, 
however,  the  correlations  were  all  below  .85  (p  <  .05), 
signifying  a  difference  between  the  recordings. 
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Conclusion 

The  higher  rolloff  of  low  frequency  response  for 
the  head  mounted  microphone  was  expected  to  produce  a 
degradation  of  perturbation  measures  at  low  F(  (below  the 
flat  response  of  the  microphone),  especially  in  the  presence 
of  vibrato.  Frequency  modulation,  with  its  concomitant 
amplitude  modulation,  was  expected  to  affect  the  long¬ 
term  amplitude  measures.  However,  the  coefficient  of 
variation  of  amplitude  (CVA)  measures  did  not  indicate  a 
significant  difference.  A  small  variability  in  short  term 
measures  may  be  partially  explained  by  individual  differ¬ 
ences  in  microphone  phase  distortion  of  the  acoustic  signal 
and  by  differences  in  microphone  sensitivity  (Titze  and 
Winholtz,  in  press). 

Simple  tests  with  head  motion  indicated  that  move¬ 
ment  artifact  can  inflate  long  term  amplitude  measures  by 
two  to  three  times.  Since  the  microphones  gave  similar 
perturbation  measures  when  no  movement  was  present,  the 
head  mounted  microphone  showed  a  clear  advantage  by 
maintaining  a  constant  source  to  microphone  distance. 

Some  additional  comments  are  in  order.  Because 
the  head  mounted  microphone  moves  with  the  head,  the 
microphone  cable  must  have  strain  relief  to  eliminate 
motion  noise  that  can  be  conducted  through  the  cable.  Also, 
the  electrical  output  of  the  microphone  is  unbalanced  from 
the  microphone  to  the  XLR  connector.  This  is  unfortunate 
because  it  is  susceptible  to  high  levels  of  electromagnetic 
interference,  such  as  that  radiating  from  a  stroboscopic 
light  source.  Some  further  experimentation  may  be  needed 
to  determine  if  the  microphone  can  be  used  in  various 
different  clinical  environments. 

It  is  important  to  note  that  this  study  used  only 
sustained  vowel  phonation  tasks.  The  effects  of  aerody¬ 
namic  a  acts  at  close  source  to  microphone  distances  also 
awaits  further  study.  The  head  mounted  microphone  may 
not  be  the  ideal  choice  for  all  vocal  tasks.  However,  based 
upon  the  correlation  data  for  a  limited  set  of  tasks  performed 
here,  it  appears  that  the  miniature  head  mounted  micro¬ 
phone  can  become  a  standard  for  voice  perturbation  analy¬ 
sis. 
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Abstract 

A  comparison  was  made  between  two  methods  of 
obtaining  a  Voice  Range  Profile.  One  method  was  tradi¬ 
tional,  involving  a  clinician  who  gave  instructions,  moti¬ 
vated  the  subject  to  achieve  the  greatest  intensity  range,  and 
determined  when  the  goal  was  achieved.  The  second 
method  was  completely  automated,  involving  the  use  of  a 
video  tape  for  instruction  and  a  computer  for  elicitation  and 
evaluation.  Results  indicated  that  there  is  no  obvious 
preference  for  the  use  of  either  method,  but  some  differ¬ 
ences  are  noted. 

Introduction 

The  Voice  Range  Profile  (VRP),  also  called  the 
phonetogram  (Damstl,  1970),  is  used  as  a  clinical  tool  to 
establish  a  vocalist’s  range  of  intensity  from  the  lowest  to 
the  highest  fundamental  frequencies  (Coleman,  Mabis,  & 
Hinson,  1977;  Schulte,  &  Seidner,  1983;Gramming,  1988; 
Klingholz,  1990).  Typically,  the  VRP  is  obtained  by 


prompting  the  vocalist  with  a  series  of  pitches  from  a 
keyboard  instrument,  or  a  pitch  pipe,  and  requesting  the 
softest  and  loudest  productions  possible  (Coleman,  1993). 
The  clinician  determines  whether  the  effort  was  at  the  right 
pitch.  Motivating  the  vocalist  to  give  the  best  performance 
(their  absolute  softest  and  absolute  loudest)  takes  consider¬ 
able  time  and  effort  from  the  vocologist  (clinician  or 
teacher).  It  is  not  unusual  to  spend  a  half-hour  to  get  a 
satisfactory  VRP. 

Given  this  investment  in  time,  and  the  possibility 
of  varied  results  due  to  human  interaction,  it  seems  logical 
to  ask  if  the  task  could  be  economized  and  standardized  by 
full  automation.  Each  vocalist  would  receive  exactly  the 
same  instructions  and  coaching  (by  video  tape),  and  the 
frequency-intensity  information  would  be  gathered  and 
plotted  by  computer  (Pabon  &  Plomp,  1988).  The  present 
investigation  deals  with  the  trade-offs  between  the  loss  of 
information  and  the  gain  in  economy  when  the  VRP  proce¬ 
dure  is  fully  automated. 
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Methods 

The  study  was  conducted  as  an  IAC  recording 
booth  (11*5”  x  10*5”  x  80  at  the  Recording  and  Research 
Center  of  the  Denver  Center  for  the  Performing  Aits.  Two 
separate  protocols  were  established  for  obtaining  a  VRP, 
and  all  subjects  followed  both  protocols. 

Subjects 

A  total  of  20  subjects  woe  involved  in  the  study. 
These  consisted  of  10  nudes  ranging  in  age  from  30  to  42 
years,  withamean  age  of  37.4  yean,  and  10  females  ranging 
in  age  from  24  to  48  years,  with  a  mean  age  of  32.9  years. 
As  determined  by  a  questionnaire,  the  subjects  were  free  of 
any  history  of  laryngeal  pathology  and  each  subject  re¬ 
ported  that  he/she  was  in  good  vocal  health  at  the  time  of  the 
study.  A  laryngeal  examination  was  not  conducted  for 
economic  reasons.  None  of  the  subjects  had  had  formal 
voice  training  and  all  were  unfamiliar  with  the  experiment. 
The  subjects  were  divided  into  two  groups.  Five  females 
and  five  males  did  Protocol  1  followed  by  Protocol  2,  while 
the  other  groups  of  five  females  and  five  males  did  the 
protocols  in  reverse  order. 

Protocol  1 

A  certified  speech  language  pathologist  (SLP- 
CCO  explained  the  procedure  for  obtaining  a  VRP  to  each 
subject  After  the  procedure  was  clarified,  each  subject  was 
asked  to  produce  a  sustained  /a/  vowel  at  a  comfortable 
pitch  and  intensity.  The  purpose  of  this  was  to  determine 
a  starting  pitch  which  would  presumably  fall  within  a 
comfortable  range  for  each  subject  (Invariably,  the  pitch 
volunteered  by  each  subject  was  within  a  few  whole-tones 
of  his/her  lowest  sustainable  frequency  in  modal  register.) 
The  clinician  then  used  a  small  keyboard  instrument  to 
prompt  the  subject  with  successive  target  pitches. 

In  order  to  facilitate  comparison  of  the  clinician- 
assisted  VRP  with  the  automated  program,  the  chosen 
target  pitches  corresponded  roughly  to  the  center  of  each 
“frequency  bin”  contained  in  the  computerized  program. 
The  target  pitches  determined  in  this  manner  consisted  of  a 
whole-tone  scale  spanning  the  subject’s  entire  range  and 
containing  the  pitches  c,  d,  e,  ft,  g#  and  a#  in  each 
successive  octave. 

For  the  most  part,  a  fairly  regular  pattern  of  pitch 
presentation  was  followed  from  subject  to  subject  That  is, 
each  subject  was  asked  to  produce  the  target  pitches  in 
descending  order  from  his  or  her  starting  pitch  down  to  his 
or  her  lowest  sustainable  tone,  then  in  ascending  order  from 
the  starting  pitch  to  the  highest  tone  be  or  she  could  produce. 
The  best  of  three  efforts  at  each  pitch  and  intensity  level 
throughout  the  subject’s  range  was  recorded.  For  some 
subjects,  a  shift  into  a  higher  vocal  register  was  facilitated 
by  allowing  them  at  some  point  to  skip  from  modal  register 
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Figure  I.  Comparison  of  ataamatod  (solid  late)  msd  clinician-assisted 
(Hashed  lints)  Votes  Range  Profiles,  (a)  i  normal  male  subjects  and  (b)  5 
normal f smalt  subjects.  The  aatonuued  procedure  was  administered  first 
m  both  casts. 

up  to  a  tone  at  or  near  the  top  of  the  vocal  frequency  range 
and  working  down  from  there.  (This  was  especially  helpful 
in  eliciting  falsetto  productions  from  the  male  subjects.)  In 
some  cases,  afterreviewing  the  overall  profile,  the  clinician 
bad  the  subject  repeat  his  or  her  efforts  in  a  particular  pitch 
region  if  she  had  reason  to  feel  the  best  performance  for 
those  tones  was  not  captured  on  the  first  trial. 

The  subjects  were  encouraged  to  produce  their 
least  and  greatest  SPL  efforts  at  each  target  pitch  without 
regard  to  musical  quality  of  the  tone,  but,  at  the  same  time, 
without  causing  discomfort.  When  appropriate,  modeling 
was  provided  by  the  clinician.  Acceptable  productions 
were  those  judged  to  be  “stable”  and  at  least  l-2s  in 
duration.  The  sound  level  of  the  pbonations  were  measured 
with  a  calibrated  Bruel  and  Kjaer  Type  2230  sound  level 
meter  at  a  mouth-  to-microphone  distance  of  approximately 
g  cm.  The  meter  settings  were  RMS,  fast,  and  linear 
frequency  weighting  (20-20  kHz).  The  clinician  deter¬ 
mined  by  ear  whether  a  particular  pitch  was  adequate  or 
should  be  revisited. 
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Protocol! 

Standardized  instructions  were  given  using  a  S 
minute  video  tape  (narrative  shown  in  the  Appendix).  The 
tape  gave  a  brief  explanation  of  the  purpose  of  the  VRP  and 
how  it  would  be  elicited.  Tbe  subject  was  encouraged  to  do 
as  well  as  possible  without  straining  the  voice  and  was  also 

phntalinn  in  any  tray.  «nrl  ln.wW««  nrtirr  o/a« 

acceptable.  The  tape  featured  a  person  working  interac¬ 
tively  with  the  computer  to  obtain  the  VRP.  Following  the 
video  tape,  a  head  mounted  microphone  (AKG  C410), 
located  8  cm  from  tbe  mouth  at  an  angle  of  45  degrees  from 
the  center  of  the  lips,  was  placed  on  the  subject.  The  subject 
was  reminded  to  use  only  the  /a/  vowel,  to  refrain  from 
adjusting  the  microphone,  and  to  give  the  best  effort  without 
straining  the  voice.  Any  questions  regarding  the  concept  or 
the  procedure  were  answered  at  this  time,  as  long  as  they  did 
not  pertain  to  performance  expectations.  The  voice  signal 
was  acquired  directly  by  computer  and  Fo  and  intensity 
were  extracted  and  displayed  in  real-time.  The  voice  signal 
was  also  recorded  on  DAT  for  archival  purposes. 

The  Fo  and  intensity  extraction  algorithms  were 
implemented  on  a  Macll  FX  computer  using  LabView,  a 
graphical  block-diagram  oriented  digital  signal  processing 
system.  The  system  consisted  of  display  and  programming 
software  as  well  as  hardware  digitizing  and  processing 
boards  (TMS320C30  based). 

Tbe  microphone  signal  was  digitized  at  22  kHz 
and  segments  of  S12  points  were  analyzed  with  the  DSP 
board.  For  each  segment,  the  cepstrum  was  calculated 
(Noll,  1967)  and  the  fundamental  frequency  was  extracted 
by  finding  tbe  largest  peak  in  tbe  cepstral  signal  subsequent 
to  the  initial  onset  transient  (Hess,  1983). 

While  the  pitch  was  extracted,  the  intensity  was 
obtained  from  the  variance  of  the  microphone  signal  (aver¬ 
age  squared  difference  from  the  mean)  to  eliminate  low 
frequency  drift  For  display,  the  intensity  was  converted  to 
decibels.  Once  both  Fo  and  intensity  were  obtained,  they 
were  compared  against  the  noise  threshold  and  not  used  if 
below  the  noise  threshold  (noise  was  below  65  dB  in  the 
recording  booth).  The  intensity-frequency  data  pair  was 
then  displayed  as  a  black  “X”  on  the  screen  and  added  to  a 
list  of  data  pairs  kept  for  the  last  1.5  seconds.  The  list  was 
then  examined  to  see  if  it  met  a  steadiness  criterion.  The 
criterion  was  defined  as:  a  minimum  of  two  on  the  list,  less 
than  4  dB  SPL  variation,  less  than  1 .5  seconds  time  duration 
elapsed,  and  a  maximum  of  0.09  log  units  in  frequency 
variation  (log(F-M)-k)g(FMit)<0.09).  This  criterion  was  a 
compromise  between  stability  and  subject  frustration.  Once 
tbe  criterion  was  met,  the  average  intensity  and  Fo  for  tbe 
list  was  calculated.  If  this  point  was  above  the  current 
maximum  or  below  the  current  minimum  for  a  frequency 
bin  (indicated  by  a  b'ue  and  a  red  tine,  respectively),  a 
replacement  tine  was  drawn. 


The  automated  program  allowed  the  subject  to 
select  any  pitch  at  any  time,  and  subjects  could  return  and 
improve  their  effort  any  time  during  the  30  minute  period. 

Results 

Figure  1  (previous  page)  shows  overlays  ofVRPs 
obtained  with  the  two  procedures  on  ten  normal  subjects. 
The  automated  procedure  (solid  lines)  was  administered 
first  and  tbe  clinician-assisted  procedure  (dashed  tines)  was 
administered  secood.  Males  are  on  tbe  left  panel  (Figure  la) 
and  females  are  on  the  right  (Figure  lb).  Each  subject  is 
identified  by  letters  in  the  upper  left  corner  of  each  plot. 
Fundamental  frequency  is  plotted  in  equal  logarithmic 
intervals  and  intensity  is  plotted  in  dB. 

Figure  2  is  similar  to  Figure  1  except  that  the  two 
procedures  are  reversed  and  different  subjects  were  used. 
Thus,  qualitatively  the  results  are  the  same,  but  individual 
differences  are  evident 

Tables  1  and  2  (next  page)  contain  data  extracted 
from  the  figures.  The  area  of  tbe  VRP  envelope  is  measured 


Figure  2  Comparison  of  automated  (solid  linos)  and  clinician-assisted 
(dashed  lines)  Voice  Kongo  Profiles,  (a)  5  normal  male  subjects  <md  (b)  5 
normal  female  subjects.  The  automated  procedure  was  administered 
second  in  both  cases. 
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as  a  cumulative  sum  of  dB  range  per  bin.  Each  bin  is  a  unit 
wide,  and  is  centered  about  a  tone  on  the  musical  scale.  For 
those  bins  in  which  only  one  loudness  level  was  achieved, 
the  range  for  that  bin  was  settoldfi.  The  areas  ranged  from 
260  to 654 dB  bins  for  the  clinician-obtained  scores  and  282 
to  836  dB  bins  for  the  computer  obtained  scores.  The 
frequency  ranged  from  12  to  24  bins. 

The  maximum  loudness  achieved  was  122  dB 
using  the  automated  procedure  and  114  dB  using  the 


single  tone  varied  from  30  to  52  dB  for  the  automated 
protocol  and  27  to  48  dB  for  the  clinician-assisted  protocol. 
Since  these  ranges  are  relative,  they  are  more  useful  than  the 
absolute  maximum  and  minimum  loudness.  It  should  be 
noted,  however,  that  the  tone  or  bin  at  which  the  maximum 
range  occurred  varied  from  subject  to  subject  and  also  from 
one  protocol  to  the  other. 

A  three  factor  repeated  measures  ANOVA  was 
performed  using  gender,  protocol  order  and  protocol  type  as 
factors.  The  area,  measured  in  dB  bins,  was  used  as  a  score. 
The  analysis  indicated  that  grader  was  not  highly  signifi¬ 
cant  ((16,1),  f*0.37  with  significance  0.552).  Protocol 
order  was  also  not  highly  significant  ((16,1).  f*0.06  with 
significance  0.803). 

Protocol  type  was  significant  at  the  10%  level,  but 
not  at  5%  ((16,1),  fr3.13  with  significance  0.096).  The 
cross  effects  of  these  factors  were  all  not  significant. 

A  scattergram  of  clinician-obtained  vs  computer- 
obtained  VRP  area  scores  is  shown  in  Fig.  3.  The  Pearson 
bivariate,  two  tailed  correlation  between  these  two  methods 
yields  a  correlation  coefficient  of 0.5286  and  significance 
of  p  *  0.017,  indicating  moderate  correlation  between  the 
protocols. 

A  subjective,  visual  comparison  of  the  computer 
versus  clinician  obtained  VRPs  presented  in  Figures  1  and 
2  reveals  similarities  between  the  profiles  obtained  by  the 
two  protocols  for  most  of  the  subjects.  Since  the  two 
protocols  were  recorded  on  different  days,  it  is  possible  that 
the  absolute  positioning  of  the  VRP  varied  according  to  the 
distance  between  the  lips  and  the  recording  transducer.  It 
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is  also  possible  due  there  were  significant  day  to  day 
differences  in  the  performance  of  the  subjects,  as  noted  by 
Coleman  (1993).  These  differences  can  appear  on  the  VRP 
figure  as  vertically  shifted  boundaries.  The  area  calculation 
for  each  VRP  is  independent  of  vertical  positioning,  how¬ 
ever.  U  one  discounts  this  vertical  shift,  the  similarities 
between  the  two  profiles  become  more  apparent.  This  is 
particularly  evident  in  the  profiles  of  subjects  CG,  FH.  and 
MW  in  Figure  1  and  TD,  JS,  and  ER  in  Figure  2.  Of 
particular  interest  is  the  unusual  “square”  profile  of  subject 
ER,  which  is  faithfully  duplicated  in  the  automated  session 

Frequency  (bin)  range  between  the  two  VRPs 
typically  differed  by  ooe  or  two  bins  at  either  end.  Some 
notable  outliers  in  the  lower  frequency  bins  are  evident  in 
the  profiles  of  CG,  PG,  and  AJ  in  Figure  1  and  AD  Y,  LMI, 
AP.  and  ADL  in  Figure  2.  These  do  not  reflect  actual  pitch 
range  differences  but  appear  to  be  related  to  the  detection 
of  subhannonics,  which  are  discussed  more  fully  below. 

Comparison  of  the  profiles  suggests  that  for  some 
subjects,  clinician  involvement  appeared  to  be  of  great 
benefit.  For  example,  the  computer-elicited  profile  for 
subject  PG  in  Figure  1  reveals  a  markedly  reduced  perfor¬ 
mance  in  the  higher  frequencies  as  compared  to  the  clini¬ 
cian-assisted  profile.  A  review  of  the  audiotape  revealed 
that,  for  the  most  part,  there  was  a  reduced  overall  effort  by 
the  subject  during  the  automated  session,  both  in  terms  of 
dynamic  range  and  length  of  phonation.  There  appeared  to 
be  a  lade  of  motivation  for  producing  the  higher  tones. 

The  computer-elicited  profile  of  KF  reveals  a 
depressed  areaon  the  highSPL  tracing  through  this  subject’s 
mid-frequency  range.  Intuitively,  ooe  might  suspect  that 
this  depressed  area  of  performance  could  be  related  to 
difficulty  on  the  part  of  the  subject  in  negotiating  a  shift 
from  a  heavy,  chest-type  register  to  a  lighter  register.  A 
review  of  the  audiotape  from  this  subject’s  computer- 
automated  session  suggested,  however  that  KF  did  indeed 
have  some  difficulty  producing  high  SPLs  without  the 
assistance  ofaclinidan  in  this  transitional  region.  One  also 
might  suspect  that  the  “gap”  on  the  low  SPL  tracing  in  this 
region  would  be  typical  of  reduced  dynamic  control  in  the 
region  of  register  shifts;  in  this  particular  instance  audiotape 
review  suggests  that  this  gap  was  likely  due  to  difficulty 
with  phonation  stability,  i.e.,  the  computer  rejected  those 
attempts  as  unstable. 

In  addition  to  occasional  problems  with  phonation 
stability  (which  seems  to  be  more  prevalent  with  higher 
frequency  productions  in  general,  and  with  lower  SPLs  in 
the  modal  register),  gaps  in  the  contour  of  the  automated 
VRP  also  occur  when  a  subject,  for  one  reason  or  another, 
does  not  revisit  a  certain  tone  in  an  effort  to  better  his  or  her 
performance.  This  was  the  case  with  subject  LMS  and 
accounts  for  the  lower  trace  peaks  at  about  18S  Hz  and  520 
Hz.  The  gap  at 400  Hz  was  due  to  pitch  stability  problems. 

A  number  of  the  subjects  demonstrated  improved 
overall  performance  in  terms  of  SPL  range  in  tbe  automated 


sessions  and  several  commented  on  tbe  benefit  of  visual 
feedback  oo  their  performance,  particularly  with  regard  to 
self-motivation.  Subject  AP  is  a  subject  whose  SPL  range 
in  the  automated  session  showed  marked  improvement  over 
the  clinician-assisted  session.  It  is  interesting  to  note  that, 
although  this  subject  had  significant  difficulties  with  pitch 
matching  in  the  clinician-assisted  session,  the  automated 
VRP  reveals  a  fruriy  smooth  profile.  Undoubtedly  AP 
benefited  considerably  from  visual  feedback  in  tbe  auto¬ 
mated  session,  and  tbe  lade  of  a  requirement  to  match 
particular  pitches. 

One  source  of  variation  between  die  automated 
VRPs  and  tbe  clinician-elicited  VRPs  can  be  attributed  to 
the  fact  that  occasionally  the  computer  extracts  and  records 
a  “subharmonic"  rather  than  the  perceived  fundamental 
frequency.  One  might  argue  that  this  is  an  accurate  acoustic 
representation  of  what  is  actually  being  produced  and 
therefore  a  valid  measure  in  the  VRP.  On  the  other  hand, 
it  is  easy  to  see  how  a  VRP  that  demonstrates  frequency 
ranges  below  150  Hz  in  female  subjects  could  be  very 
misleading  to  a  clinician.  If  these  measures  reflect 
subharmonic  activity  one  octave  below  tbe  actual  pitch 
target  or  perceived  fundamental  frequency  for  that  pbona- 
tion,  they  should  at  least  be  labeled  as  a  different  vocal 
quality. 

Many  times  these  subhannonics  appear  as  obvious 
outliers  on  the  low  frequency  end  of  tbe  VRP  as  can  be  seen 
in  the  profiles  for  CG,  PG,  and  AJ  in  Figure  1  and  ADY  and 
AP  in  Figure  2.  These  readings  reflect  the  computer 
response  to  subhannonics  produced  during  the  production 
of  tones  in  the  modal  register.  In  some  cases  the  presence 
of  subharmonics  in  tbe  lower  bins  is  less  easily  identified. 
Subject  LMI  in  Figure  2  appears  to  have  a  continuous 
profile  extending  well  below  what  would  be  expected  from 
a  female  subject  Review  of  the  audiotape  revealed  that 
although  her  lower  range  was  quite  impressive,  the  record¬ 
ing  of  bins  below  100  Hz  were  actually  subhannonics 
produced  during  production  of  tones  one  octave  above. 
This  is  tbe  case  with  the  lower  bins  recorded  in  subject  ADL 
as  well. 

Tbe  recording  of  subharmonic  activity  can  present 
more  subtle  variations  elsewhere  in  the  profile,  since  pro¬ 
duction  of  very  high  frequencies  occasionally  results  in 
subharmonic  recordings  that  might  impact  on  the  shape  of 
the  mid-portion  of  the  VRP.  It  is  felt  that  this  is  not  a  major 
problem  since  tbe  energy  produced  by  these  subhannonics 
tends  to  fall  within  the  profile  envelope  and  thus  does  not 
replace  SPL  values  recorded  at  the  actual  target  pitch.  Tbe 
fairly  close  agreement  between  overall  VRP  shapes  for 
these  subjects  would  seem  to  confirm  this. 

Conclusions 

Results  of  this  study  suggest  that,  for  normal 
subjects,  a  full  Voice  Range  Profile  can  be  obtained  with  a 
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fully-automated  protocol.  This  may  save  a  clinician  con¬ 
siderable  time  and  effort  By  using  fixed,  objective  criteria 
for  determining  steady-state  phooattoo,  clinician  variabil¬ 
ity  in  acceptance  criteria  is  eliminated.  The  computer 
considers  voice  stability  insofar  as  it  is  measurable  in  pitch 
variability,  intensity  variability  and  timing.  Musicalityis 
not  a  criterion. 

The  visual  feedback  offered  by  the  automated 
procedure  can  be  a  helpful  aid  to  both  the  clinician  and  the 
subject  since  there  are  sometimes  difficulties  in  eliciting 
vocal  utterances  (because  of  pitch-matching  and  pitch- 
perception  problems). 

For  those  clinicians  who  have  trouble  identifying 
pitch,  this  tool  relieves  them  of  this  burden.  The  subjects 
also  benefit  since  they  are  not  required  to  match  specific 
pitches,  but  may  choose  pitches  randomly  throughout  the 
frequency  range.  "Tiere  are  some  subjects  who  might  prefer 
presentation  of  a  pilch  target  or  would  benefit  from  a 
reminder  of  a  pitch  which  needs  to  be  revisited.  The 
program  could  be  modified  to  work  with  akey board  so  that 
any  tone  desired  could  be  presented  through  an  earphone. 

Review  of  the  audiotapes  from  the  automated 
sessions  coupled  with  subjective  analysis  of  the  profile 
displays  suggest  that  for  some  subjects  the  automated  VRPs 
may  underestimate  the  high  pitch  ranges  (due  to  tracking 
difficulties)  and  overestimate  the  lower  pitch  ranges  (due  to 
the  recording  of  subbarmonics).  These  problems  could  be 
controlled  by  including  a  brief  period  of  clinician  interven¬ 
tion  at  the  end  of  an  automated  session  to  confirm  the 
validity  of  the  lower  bins  in  the  VRP  and  to  fill  in  the  gaps 
at  the  upper  end  (and  elsewhere,  if  necessary).  The  intro¬ 
duction  of  the  clinician  would  make  it  difficult  to  maintain 
consistency  of  presentation  to  the  subject,  however.  The 
alternative  is  to  leave  it  to  the  subject  to  interpret  their 
effort 

The  question  of  how  to  deal  with  the  recording  of 
subbarmonics  in  the  automated  profile  bears  further  inves¬ 
tigation.  Perhaps  future  modifications  to  the  computer 
algorithm  might  include  a  means  of  reducing  sensitivity  to 
subbarmonics.  When  the  cepstral  technique  is  used  for 
pitch  tracking,  the  largest  peak  in  the  cepstram  is  identified 
as  the  fundamental  (excluding  the  transient  part  at  the 
beginning  of  the  time  line).  When  subharmonics  are 
present,  the  largest  peak  in  the  cepstrum  may  appear  at  the 
period  of  the  subharmonic,  indicating  that  this  peak  is  a 
better  candidate  for  the  fundamental.  A  possible  solution 
might  be  to  provide  a  keyboard  for  the  subjects  to  prompt 
themselves.  The  computer  could  identify  the  tone  and 
narrow  its  expected  pitch  period  range  so  that  it  rejected 
fundamental  frequencies  more  than  half  an  octave  away. 
The  subject  would  then  be  expected  to  phonate  close  to  the 
keyboard  tone.  This  would  restrict  the  subject  by  requiring 
pitch  matching  (although  the  order  of  the  pitches  is  still 
determined  by  the  subject).  This  problem  illustrates  the 


need  fora  better  definition  of  fundamental  frequency  when 
the  voice  strays  from  being  a  steady  state  pbonadon  to 
displaying  diplopbonic,  rough,  or  even  chaotic  characteris¬ 
tics. 

The  differences  in  VRP  areas  between  the  proto¬ 
cols  for  each  subject  suggests  that  the  VRP  obtained  is 
influenced  by  the  protocol  (but  only  at  the  10%  significance 
level).  It  would  be  useful  to  study  whether  for  a  particular 
protocol,  a  subject  can  achieve  a  consistent  VRP  over  time. 

One  might  wonder  bow  much  of  the  increased 
VRP  area  demonstrated  in  some  of  the  computer-obtained 
profiles  was  aresultof  greater  performance  on  the  high  SPL 
side.  This  does  bring  up  a  two-fold  concern:  (l)Ooesone 
run  the  risk  of  some  over-zealous  subjects  straining  their 
voices  in  the  situation  where  clinicians  are  not  present  to 
monitor  vocal  behavior,  and  (2)  Are  the  high  SRLs  under¬ 
estimated  in  the  clinician-assisted  sessions  because  of 
clinician  bias?  The  clinician  may  have  bad  a  preconceived 
notion  of  what  is  safe  as  a  maximum  SPL  level,  based  on 
effort  perceived  from  the  subject  For  tbis  reason,  it  is 
possible  that  a  clinician  might  not  push  for  the  loudest 
productions.  The  problems  of  self-inflicted  vocal  abuse  in 
an  automated  procedure  and  over-guarding  in  a  clinician- 
assisted  procedure  need  further  addressing. 
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Narrative  oa  Video  Tape* 

Thank  you  for  participating  in  this  experiment  at 
the  Recording  and  Research  Center. 

On  the  computer  screen  behind  me  is  a  voice  range 
profile  program.  The  purpose  oftbe  program  is  to  measure 
your  loudest  and  softest  efforts  from  low  pitch  to  high  pitch 
to  determine  your  vocal  range. 

Please  watch  this  video  tape  in  its  entirety  before 
using  the  program.  On  the  horizontal  axis,  the  program  will 
display  your  pitch,  from  lowest  to  highest  On  the  vertical 
axis,  the  program  will  display  your  loudness,  from  soft  to 
loud.  There  is  a  range  switch  in  the  lower  right-hand  corner 
of  the  screen.  To  control  the  switch,  use  tbe  mouse  to  move 
the  curser  on  the  switch.  Press  the  button  to  change  the 
switch  from  the  “M”  position  to  the  “F”  position.  The  “M” 
position  is  for  pitches  less  than 500  Hz,  or  this  portion  of  the 
screen.  The  “F”  position  is  for  pitches  greater  than  500  Hz, 
or  this  portion  of  the  screen. 

When  you  make  a  sound  an  “X”  will  appe  _•  on  the 
screen,  indicating  the  current  pitch  and  loudness  levels. 
When  a  sound  is  held  for  approximately  one  second,  a  blue 
and  a  red  bar  will  appear  on  the  screen.  [Demonstrates] 
“Ah.”  The  blue  bar  indicates  the  loudest  effort  at  that  pi  tch, 
and  the  red  bar  indicates  the  softest  effort  at  that  pitch. 

To  move  the  blue  bar  up,  make  a  louder  sound  at 
tbe  same  pitch.  [Demonstrates]  “Ah.”  To  move  the  red  bar 
down,  make  a  softer  sound  at  tbe  same  pitch.  [Demon¬ 
strates]  “Ah." 

To  increase  your  voice  range  profile,  repeat  the 
same  process  at  different  pitches.  If  tbe  “X”  appears 
between  the  bars,  they  will  not  move.  They  will  only  move 
when  you  are  louder  than  the  blue  bar  or  softer  than  the  red 
bar.  [Demonstrates]  “Ah.” 

Here’s  an  example  of  how  to  begin  building  your 
voice  profile.  [Demonstrates]  “Ah.”  “Ah.”  “Ah.”  “Ah." 
“Ah." 

On  the  screen  you  see  an  example  of  a  completed 
voice  range  profile. 


You  have  30  minima  to  do  this  task.  Take  your 
time.  Remember,  you  can  always  go  back  to  a  particular 
pitch  and  try  for  a  louder  or  softer  sound. 

The  purpose  of  the  experiment  is  to  establish  your 
loudest  and  softest  sounds  at  every  pitch.  During  tbe 
experiment  strive  for  maximum  effort,  but  avoid  discom¬ 
fort  in  your  voice.  When  you  are  through,  use  tbe  phone  and 
dial  the  extension  provided. 

Thanks  again  for  participating  in  this  experiment 


•The  tape  is  available  by  writing  to:  The  Recording  and 
Research  Center.  1243  Champa  Street,  Denver,  CO  80204. 
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Abstract 

Speaking-time  ratios  (SIR),  utterance  and  pause 
durations  of  oral  reading  and  impromptu  speech  were 
investigated  for  twenty  young  adults  (ten  males  and  ten 
females).  The  reading  material  was  the  first  paragraph  of 
the  Rainbow  passage.  The  impromptu  speech  was  elicited 
by  asking  the  subject  to  describe  a  picture.  Results  showed 
that  oral  reading  was  associated  with  greater  STRs  and 
mean  utterance  durations  and  smaller  pause  durations  than 
the  impromptu  speech.  No  gender  differences  were  found 
for  any  of  the  three  measures.  The  average  STR  value  was 
.87  for  oral  reading  and  .69  for  impromptu  speech.  The  STR 
value  of  .87  for  oral  reading  was  much  greater  than  the  value 
of  .70  reported  by  a  normative  study  employing  120  young 
adults.  A  potential  cause  of  such  disparity  and  clinical 
significance  of  STR  measurements  was  discussed. 


Suprasegmental  durational  measures  of  human 
speech  reflect  important  underlying  cognitive  and  physi¬ 
ologic  processes.  The  suprasegmental  durational  measures 
include  total  speaking  time,  utterance  and  pause  durations, 
voiced  speech  segment  durations  (also  called  phonation 
time)  and  voiceless  speech  segment  durations.  The  total 
speaking  time  is  the  sum  of  the  utterance  and  pause  dura¬ 
tions.  The  utterance  duration  is  sometimes  called  articula¬ 
tion  time,  and  is  the  sum  of  voiced  and  voiceless  segment 
durations.  Given  the  number  of  syllables  or  words  in  the 
utterances,  the  rate  of  speech  can  be  expressed  in  terms  of 
words/minute  or  syllables/minute  by  dividing  the  number 


by  the  speaking  time.  The  rate  can  be  calculated  with  or 
without  pause  durations  included  in  the  speaking  time. 

From  the  primary  duration  measures,  some  ratio 
measures  can  be  derived.  The  speaking-time  ratio  is  the 
ratio  of  the  sum  of  utterance  durations  to  the  total  speaking 
time.  The  phonation-time  ratio  is  calculated  as  the  ratio  of 
the  sum  of  voiced  segment  durations  to  the  sum  of  utterance 
durations  (excluding  pauses)  or  to  the  total  speaking  time 
(including  pauses).  The  articulation-phonation  time  ratio  is 
the  ratio  of  the  sum  of  utterance  durations  to  the  sum  of 
phonation  times. 

Durational  measurements  have  been  useful  in 
describing  speech  characteristics,  for  example,  of  dramatic 
reading  (Fairbanks  and  Hoaglin,  1941),  of  superior  esopha¬ 
geal  speakers  (Horii,  1983b),  of  a  dysphonic  patient 
(Watanabeetal.,  1987),  of  stutterers  (Horii &Ramig,  1987) 
and  of  dysarthric  patients  (Till  &  Alp,  1991).  Analysis  of 
pause  frequency  and  durations  of  spontaneous  speech, 
furthermore,  has  received  increasing  attention  as  a  means  of 
examining  latencies  of  underlying  cognitive  processes  (see 
for  example,  review  papers  by  Goldman-Eisler,  1972; 
O’Connell  &  Kowal,  1981;  Rochester,  1973). 

There  is,  however,  a  paucuy  of  normative  data. 
The  limited  normative  data  in  the  literature,  furthermore, 
are  often  difficult  to  compare  and  are  nondefinitive  regard¬ 
ing  possible  age  and  gender  differences  in  durational  char¬ 
acteristics  of  connected  speech  (Hartman  &  Danhauer, 
1976;  Oyer  &  Deal,  1985;  Walker,  1988).  These  data  are 
difficult  to  evaluate  due  to  confounding  factors  of  speech 
tasks  (e.g.,  oral  reading,  recitations  from  memory,  sponta¬ 
neous  speech,  conversational  speech),  differences  of  the 
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leading  texts,  tbe  amount  of  speech  samples,  and  variations 
in  definitions  of  specific  measures.  Other  factors  Influence 
durational  measures  in  connected  speech  as  well.  These 
factors  include  speaking  conditions  (loud  versus  soft), 
emotional  states,  type  of  audience  (children  versus  adults, 
for  example)  and  subject  matters. 

Speech  tasks  (oral  reading  versus  impromptu 
speech,  for  example)  affect  not  only  the  magnitude  but  also 
the  interpretations  of  durational  measurements.  In  oral 
reading,  breath  grouping  is  normally  determined  by  tbe 
linguistic  structure  of  utterances.  Readers  pause  at  linguis¬ 
tically  appropriate  boundaries  (e.g.,  phrase  and  sentence 
boundaries)  primarily  to  replenish  the  air  reservoir  in  the 
lungs.  In  impromptu  speech,  on  tbe  other  hand,  the  speakers 
focus  not  on  the  oral  delivery  but  on  the  on-going  formula¬ 
tion  of  ideas  and  transformation  into  linguistically  appro¬ 
priate  strings  of  utterances.  Obviously,  the  subject  matter 
of  the  impromptu  speech  profoundly  affects  utterance  and 
pause  durations. 

The  paucity  and  difficulty  of  comparisons  of  the 
limited  normative  data  are  also  attributable  to  instrumental 
difficulties  and  differences.  Durational  measures  of  con¬ 
nected  speech  inherently  require  relatively  long  speech 
samples.  Traditional  analog  methods  such  as  oscillo¬ 
graphic  and  spectre  graphic  analyses  and  the  use  of  graphic 
level  recorders  are  not  well  suited  for  the  analysis  of  such 
large  samples  and  are  time-consuming,  laborious  and  costly. 
Recent  advancement  of  technology  including  digital  meth¬ 
ods  and  software,  however,  promises  an  increasing  amount 
of  durational  studies  of  connected  speech  (Horii,  1983a; 
O’Connell  &  Kowal,  1981;  Ruder  &  Jensen,  1970;  Till  & 
Alp,  1991;  Walker,  1988;  Watanabe  et  al.,  1987). 

Prior  to  applications  of  durational  analyses  to 
speech  produced  by  individuals  with  various  disorders, 
further  accumulation  of  normative  data  is  warranted.  The 
purpose  of  the  present  investigation  was  to  examine  STR  of 
oral  reading  of  a  standard  text  and  impromptu  speech  by 
young  adult  males  and  females.  The  oral  reading  task  was 
chosen  because  it  provided  tighter  control  of  speaking 
conditions,  linguistic  content  and  structure,  and  size  of 
speech  samples.  The  impromptu  speech  condition  was 
included  in  the  study  to  obtain  preliminary  STR  values  for 
such  speaking  condition  and  to  compare  with  the  oral 
reading  tasks.  The  oral  reading  task  was  of  primary  interest 
for  the  comparative  purposes  with  equivocal  STR  values 
reported  in  the  literature.  STR  values  ranging  from  -53  to 
.91  for  oral  reading  tasks  can  be  found  in  tbe  literature 
(Fairbanks  &  Hoaglin,  1941;  Horii,  1983c;  Walker,  1988). 

Method 

Subjects 

A  total  of  twenty  young  adults,  ten  females  and  ten 
males,  served  as  subjects.  They  ranged  in  age  from  18  to  27 
years  with  the  mean  age  of  23.9  years.  The  mean  ages  of 


the  male  and  female  groups  were  identical  (23.9  years). 
They  were  all  native  American  English  speakers  and  all 
spoke  the  same  dialect  of  American  English  usually  char¬ 
acterized  as  General  American.  They  were  free  from 
speech-,  language-,  reading-  and  hearing-problems.  All 
subjects  were  considered  to  be  untrained  speakers. 

Speech  Materials  and  Recording  Procedure* 

For  the  purpose  of  comparison  with  the  Walker 
study  (1988),  the  first  paragraph  of  the  Rainbow  passage 
(Fairbanks,  1960)  was  selected  as  the  oral  reading  text.  The 
passage  has  been  frequently  used  (especially  the  first 
paragraph)  for  speech  research  as  well  as  in  speech  clinics. 
Tbe  first  paragraph  consisted  of  98  words  and  took  on  tbe 
average  30  seconds  to  read.  Impromptu  speech  was  elicited 
by  asking  the  subject  to  describe  a  picture.  For  tbe 
impromptu  speech,  no  attempt  was  made  to  replicate  the 
Walker  study.  The  order  of  die  task  was  counterbalanced 
so  that  tbe  half  of  tbe  subjects  started  with  oral  reading  while 
the  other  half  started  with  impromptu  speech. 

Each  subject  was  seated  in  a  sound-treated  booth. 
A  condensor  microphone  (Sony  ECM50)  was  placed  ap¬ 
proximately  15  cm  from  the  subject’s  lips.  The  voices  were 
recorded  at  a  transport  speed  of  7.5  ips  on  an  AMPEX 
magnetic  tape  recorder  located  in  an  adjacent  room. 

Perceptual  Analysis  of  Pause 

Five  female  graduate  students  were  randomly 
selected  from  a  research  method  course  in  the  Department 
of  Communication  Disorders  and  Speech  Science,  Univer¬ 
sity  of  Colorado  to  provide  perceptual  judgments  as  to 
location  of  pauses.  The  students  were  native  speakers  of 
English,  and  were  free  from  problems  in  speech,  language, 
bearing  and  reading.  Each  student  was  seated  in  a  quiet 
room,  listened  to  oral  reading  played  via  a  loudspeaker 
(Ampex  622)  and  was  instructed  to  place  slash  (/)  marks  on 
a  response  form  at  “pauses”  detected  in  the  oral  reading. 
The  response  form  had  tbe  first  paragraph  of  tbe  Rainbow 
passage  typed  in  double  spacing.  “Pause”  was  not  defined 
in  the  instructions  and  its  definition  was  left  to  the  students. 
Each  student  was  allowed  to  listen  to  the  recordings  as  many 
times  as  necessary.  The  results  of  pause  identification  were 
used  to  determine  appropriate  values  of  input  parameters  to 
an  automatic  durational  analyzer  (described  below)  and  to 
verify  agreement  in  numbers  and  locations  of  pauses  be¬ 
tween  the  perceptual  and  automatic  analyses. 

Durational  Analysis  Procedures 

The  recorded  voices  were  played  back,  fullwave 
rectified  and  smoothed  by  an  RC  lowpass  filter  (Dobkin, 
1969).  The  rectifying  and  smoothing  extracted  the  intensity 
envelope  in  real  time.  This  intensity  envelope  was  digitized 
by  a  12-bit  analog-to-digita!  converter  at  a  rate  of  1000 
times  per  second,  and  stored  on  a  386  microcomputer  disk 
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using  CSRE  (Canadian  Speech  Research  Environment) 
software. 

The  digital  intensity  envelope  was  subsequently 
analyzed  for  utterance  and  pause  durations  by  software 
developed  by  Horii  (1983a).  Given  a  specification  of  the 
maximum  amplitude  threshold  for  pause,  the  minimum 
duration  of  pause,  and  the  minimum  duration  of  utterance, 
the  program  identified  utterances  and  pauses  and  printed 
out  means  and  standard  deviations  of  utterance  and  pause 
durations,  number  of  utterances  and  speaking-time  ratio. 

Results 

Perceptual  Pause  Analysis 

Results  of  the  pause  identifications  yielded  unani¬ 
mous  agreements  among  the  five  listeners  for  18  of  the  20 
readings.  For  the  remaining  two  readings,  two  of  the  five 
listeners  detected  an  additional  pause.  For  each  of  the 
twenty  readings,  the  number  and  locations  of  perceived 
pauses  were  noted.  For  those  two  readings  with 
nommanimous  results,  the  results  of  the  majority  (3  of  3 
listeners)  were  employed. 

Reliability  and  Validity  of  Measurements 

The  reliability  and  validity  of  tbe  automatic  dura¬ 
tion  analyzer  have  been  investigated  and  found  satisfactory 
(Horii,  1983a).  For  example,  when  square  pulses  of  1  and 
10  pulses/sec  were  submitted  to  the  analyzer,  results  were 
within  2  milliseconds  with  an  average  difference  of  0.3 
milliseconds  from  oscillographic  hand  measurements. 
Because  tbe  recording  quality  affects  the  reliability  and 
validity,  additional  tests  were  conducted  for  the  voice 
samples  under  investigation.  In  particular,  values  to  be  used 
for  tbe  program  input  parameters  (the  maximum  amplitude 
threshold  for  pause,  the  minimum  pause  duration,  and  the 
minimum  utterance  duration)  were  carefully  explored  be¬ 
fore  the  final  analysis. 

Eventually,  the  following  values  were  deemed 
appropriate:  the  maximum  amplitude  threshold  of  .2  volts 
(about  34  dB  below  the  peak  amplitudes,  about  10  volts,  of 
tbe  intensity  envelope),  the  minimum  pause  duration  of  100 
ms  and  the  minimum  utterance  duration  of 200  ms.  These 
parameter  values  produced  number  and  locations  of  pauses 
identical  to  the  perceptual  results  for  each  of  the  twenty 
readings. 

As  an  example.  Figure  1  illustrates  an  intensity 
envelope  with  cursors  positioned  at  the  beginning  (A)  and 
end  of  the  second  utterance  (B).  The  graphics  in  the  middle 
row  show  an  expanded  waveform  around  the  cursors  ( A  and 
B),  while  the  graphics  at  the  bottom  show  waveforms 
further  expanded  at  the  cursors.  The  CSRE  display  shows 
that  the  duration  between  the  two  cursors  is  3630 ms.  Pauses 
are  indicated  by  arrows  (hand-drawn  by  the  investigator 
after  perceptual  verification).  A  computer  printout  from  the 
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Figun  1.  A  CSREduplay  of  the  intensity  envelope  with  cunon  petitioned 
at  the  beginning  and  end  of  the  second  utterance  (A-B^.  Arrow?  indicate 
pautet  detected  by  the  automatic  duration  analyzer  and  verified 
perceptually. 


automatic  duration  analyzer  for  the  same  intensity  envelope 
listed  tbe  duration  of  the  second  utterance,  in  particular,  to 
be  3629  ms.  Overall,  results  of  the  automatic  analyzer  and 
manual  cursor  positioning  demonstrated  a  Pearson  product- 
moment  correlation  of  .999  with  the  average  absolute 
measurement  difference  ofless  than  3  ms.  The  number  and 
locations  of  pauses  were  identical  to  the  listeners’  re¬ 
sponses. 


Tablet. 

Individual  ud  group  result*  for  oral  reading.  Variable*  wen 
mean  utterance  duration*  (U),  utterance  duration  standard  deviations 
(Usd),  mean  pause  durations  (P),  pause  duration  standard  deviations 
(PSd)  (ail  in  milliseconds)  and  speaking-time  ratio  (S'l'k). 


U 

Usd 

P 

Psd 

STR 

Male  Subjects 

1 

3092 

1319 

406 

197 

.90 

2 

2917 

1172 

368 

114 

.89 

3 

3083 

980 

419 

170 

.89 

4 

2121 

701 

512 

248 

.  82 

5 

2069 

897 

568 

276 

.  80 

6 

2044 

1117 

475 

180 

.82 

7 

2261 

1065 

310 

133 

.  89 

8 

2506 

584 

492 

166 

.  85 

9 

3237 

1500 

372 

244 

.91 

10 

2348 

851 

495 

205 

.84 

Kean 

2568 

442 

.86 

s.  D. 

445 

75 

.  04 

Female  Subjects 

l 

2498 

1114 

627 

318 

.81 

2 

3021 

1315 

360 

168 

.90 

3 

2630 

1417 

427 

213 

.  87 

4 

2422 

1333 

332 

148 

.89 

5 

3772 

1263 

528 

73 

.89 

6 

2514 

952 

387 

170 

.88 

7 

2871 

1641 

297 

127 

.91 

8 

3484 

1170 

419 

73 

.91 

9 

2321 

623 

321 

98 

.89 

10 

2578 

1020 

571 

209 

.83 

Mean 

2811 

427 

.88 

S.  D. 

457 

107 

.03 
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Tahiti. 

l*divi<h>*i  «ad  group  rMutt*  for  impromptu  tpeech.  Variable*  wen 
maaa  uttenace  durarioa*  (U),  utteraace  duntioa  aUadard  deviatioa* 
(Uad).  «— a  pan**  duration  (P),  peuae  durnioa  auadard  dcvuuoa* 
(Pad)  (all  ia  milliaecoada)  aad  speaking-time  ratio  (STR). 


u 

Usd 

P 

Psd 

STR 

Hal*  Subjects 

l 

2018 

1093 

1490 

1759 

.58 

2 

1324 

661 

1023 

816 

.57 

3 

1395 

902 

721 

472 

.66 

4 

1443 

866 

947 

597 

.61 

5 

1813 

1041 

898 

1094 

.  67 

6 

2527 

1753 

645 

676 

.  80 

7 

2078 

1519 

888 

819 

.71 

8 

1841 

1077 

781 

530 

.  71 

9 

1787 

807 

842 

602 

.68 

10 

2235 

1441 

643 

489 

.78 

M*an 

1846 

888 

.68 

S.  0. 

366 

233 

.07 

Paaale  Subjects 

1 

1542 

841 

1950 

1532 

.  45 

2 

1966 

1054 

721 

457 

.74 

3 

1825 

949 

800 

511 

.70 

4 

1959 

1048 

637 

376 

.  76 

5 

1832 

1255 

1266 

1358 

.60 

6 

1788 

836 

1207 

1159 

.60 

7 

2004 

895 

970 

719 

.70 

8 

1818 

1460 

735 

544 

.72 

9 

2604 

1510 

518 

285 

.85 

10 

2118 

975 

506 

268 

.81 

Mean 

1966 

931 

.  69 

S.  0. 

316 

420 

.11 

Measurement  Results 

Tables  1  (previous  page)  and  2  show  the  results  of 
the  durational  analysis  for  the  oral  reading  and  spontaneous 
speech,  respectively.  Tables  show  individual  and  group 
results  for  mean  utterance  duration  (U),  utterance  duration 
standard  deviation  (Usd),  mean  pause  duration  (P),  pause 
duration  standard  deviation  (Psd)  and  spealdng-time  ratio. 
All  durational  measures  are  presented  in  milliseconds. 

Two-factor  (gender  x  task)  analysis  of  variance 
with  repeated  measures  revealed  that  the  gender  differences 
and  the  gender-task  interactions  were  nonsignificant  at  the 
.05  level  for  utterance  and  pause  durations  and  for  speaking¬ 
time  ratio.  The  task  differences,  however,  were  significant 
at  the  .001  level  for  each  of  the  three  variables  (Utterance 
duration:  FI, 18*25.45.  Pause  duration:  FI, 18=39. 12. 
Speaking-time  ratio:  Fl,18=63.66). 

Because  the  male  and  female  differences  were 
nonsignificant,  the  measurement  results  were  combined, 
and  Table  3  (following  page)  shows  means  and  standard 
deviations  (in  parentheses)  of  the  mean  utterance  and  pause 
durations  in  milliseconds  and  speaking-time  ratio.  Asseen 
in  the  table,  the  oral  reading  task  produced  longer  mean 
utterance  (2689  ms),  shorter  pause  (434  ms)  and  greater 
speaking-time  ratio  (.87)  than  the  impromptu  speech  (1906 
ms,  909  ms  and  .69,  respectively).  The  greater  speaking¬ 
time  ratio  was  a  necessary  consequence  of  the  increased 
utterance  duration  and  decreased  pause  duration  in  oral 
reading. 


TaUcS. 

Meta*  aad  aaadard  deviatioa*  (ia  pareathefei)  of  tbe  raeaa 
utteraace  duration*  (U),  meaa  peuae  duration*  (P),  both  ia 
millisecond*,  aad  speaking-time  ratio  (STR)  for  die  oral  reading 
aad  impromptu  speech  by  tbe  twenty  young  adults. 


Variables 

oral  Reading 

laproaptu  Speech 

U 

2689  (467) 

1906  (347) 

P 

434  (93) 

909  (341) 

STR 

.87  (.04) 

.69  (.09) 

Discussion 

General  Findings 

The  current  investigation  revealed  that  the  oral 
reading  was  associated  with  greater  mean  utterance  dura¬ 
tions  and  smaller  pause  durations  compared  to  impromptu 
(picture  description)  speech.  The  findings  were  not  surpris¬ 
ing  because  of  the  nature  of  the  tasks.  When  individual  data 
were  examined,  only  one  subject  had  shorter  mean  utter¬ 
ance  and  only  two  subjects  yielded  longer  mean  pause 
durations  in  oral  reading  than  impromptu  speech.  There 
was,  however,  no  exception  regarding  greater  speaking¬ 
time  ratios  for  oral  reading  than  impromptu  speech.  On  tbe 
average,  the  mean  utterance  and  pause  durations  and  speak¬ 
ing-time  ratio  of  oral  readings  were  approximately  141%, 
48%  and  126%,  respectively,  of  the  values  for  the  im¬ 
promptu  speech. 

Walker  ( 1 988)  reported  an  average  speaking-time 
ratio  of  .70  for  120  young  adults  reading  the  same  text,  i.e., 
the  first  paragraph  of  the  Rainbow  passage.  The  present 
investigation  yielded  the  speaking-time  ratio  ranging  from 
.80  to  .91  with  a  mean  of  .87  for  the  twenty  young  adults. 
The  current  finding  is  consistent  with  the  findings  of  earlier 
studies  (Horii,  1983b;  Horii  and  Ramig,  1987)  that  reported 
average  speaking-time  ratios  of  about  .85  for  the  same  text 
by  adult  subjects. 

Tbe  average  STR  value  of  .87  found  in  tbe  present 
study  is  much  greater  than  the  values  reported  by  Fairbanks 
and  Hoaglin  (1941)  ranging  from  .53  to  .71.  In  their  study, 
however,  the  subjects  were  six  amateur  actors  and  were 
asked  to  read  a  27 -word  passage  in  five  different  emotional 
states,  i.e.,  contempt,  anger,  fear,  grief  and  indifference. 
Tbe  passage  was  “There  is  no  other  answer.  You’ve  asked 
me  that  question  a  thousand  times,  and  my  reply  has  always 
been  the  same.  It  always  will  be  the  same”  (p.85).  With 
such  a  dramatic  reading,  there  could  be  more  and  longer 
pauses  in  the  reading  yielding  low  STR  values. 

For  the  impromptu  speech,  the  average  STR  value 
of  .69  found  in  this  study  was  difficult  to  compare  with 
findings  of  other  studies  mainly  because  of  the  differences 
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in  die  manner  of  eliciting  impromptu  speech.  Walker 
(1988)  reported  an  average  STR  of  .62  while  Till  and  Alp 
(1991)  reported  an  average  of  .78  when  their  subjects  were 
asked  to  engage  in  conversational  monologue  with  a  lis¬ 
tener.  As  stated  earlier,  the  task  (picture  descriptions,  for 
example)  and  the  subject  matter  would  have  considerably 
affected  the  utterance,  pause  durations  and  frequencies. 

The  Effect  of  the  Maximum  Amplitude  Threshold  for 

As  mentioned  earlier,  the  automatic  analysis  pro¬ 
gram  uses  the  maximum  amplitude  threshold  to  define  a 
pause.  Toassess  the  effectof  the  threshold  on  the  durational 


Speaking-Time  Ratio 
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Figure  2.  STR  as  a  function  of  the  maximum  amplitude  threshold  of  pause 
for  randomly  selected  readings  of  a  male  subject  #9  (x’s)  and  a  female 
subject  t3  (o' s).  The  threshold  values  between  the  dotted  lines  yielded  the 
number  and  locations  of  pauses  identical  between  the  automatic  analyzer 
and  perceptual  results. 


measures,  STR  in  particular,  two  readings  (one  from  the 
male  and  one  from  the  female  group)  were  randomly 
selected  and  submitted  to  the  durational  analyzer  with 
various  values  of  the  amplitude  threshold. 

Results  are  summarized  in  Figure  2  where  the 
abscissa  is  the  amplitude  threshold  in  volts  and  the  ordinate 
is  the  resulting  STR  values.  The  x’s  represent  results  for  the 
male  subject  (#9)  and  the  o’s  for  the  female  subject  (#3). 
The  amplitude  thresholds  between  the  two  vertical  dotted 
lines  yielded  the  number  and  locations  of  pauses  identical 
to  those  perceptually  identified  by  the  majority  of  the  five 
listeners. 

As  seen  in  the  figure,  the  effect  of  the  amplitude 
threshold  was  systematic  as  expected.  As  the  threshold 
increased,  the  STR  decreased.  When  the  threshold  was  set 
too  low,  such  as  .1  volts  in  the  figure,  the  entire  envelope 
was  above  the  threshold,  and  thus  no  pause  was  detected, 
resulting  in  an  STR  of  1.0  for  the  male  subject  For  the 
female  subject  the  same  .1  volts  threshold  produced  only 
two  pauses  and  the  STR  was  an  inflated  .97.  On  the  other 
hand,  when  the  threshold  was  set  too  high  (above  .4  volts  for 
these  examples),  the  STR  decreased,  with  more  pauses 
detected  than  the  perceptual  results.  Although  not  shown  in 
the  figure,  it  should  be  obvious  that  considerable  changes 
occurred  to  the  calculated  means  and  standard  deviations  of 
utterance  and  pause  durations  when  the  number  of  pauses 
(and  utterances)  changed  for  the  same  reading,  such  as 
between  the  thresholds  of  .4  and  .6  volts.  The  figure  shows, 
however,  that  STR  values  were  less  affected  by  the  change 
in  the  number  of  pauses  and  utterances. 

Consideration  of  the  effect  of  the  amplitude  thresh¬ 
old  dictates  that  the  best  threshold  is  the  lowest  threshold 
that  yielded  the  same  number  and  locations  of  pauses  as  the 
perceptual  results.  This  may  not  be  always  possible  for 
some  voice  recordings  with  poor  signal-to-noise  ratios. 
Furthermore,  certain  types  of  speech,  e.g„  speech  produced 
by  stu  tterers  and  dysarthrias,  may  not  yield  near-unanimous 
perceptual  identification  of  pauses.  Indeed,  the  term  “pause” 
itself  can  become  a  source  of  controversy  both  in  terms  of 
definition  and  measurement 

Issues  of  the  Definitions  of  Pause 

The  automatic  duration  analyzer  defined  the  pause 
as  segments  of  the  intensity  envelope  with  the  maximum 
amplitude  of  .2  volts  and  with  the  minimum  duration  of  100 
ms.  In  essence,  a  “pause”  was  a  “silent  gap”  in  the  acoustic 
signals.  Fortunately,  for  the  oral  reading  of  the  particular 
text  used  in  the  study,  the  “silent  gaps”  were  nearly  identical 
to  the  listeners’  intuitive  definition  of  “pause”.  The  only 
exceptions  which  occurred  were  between  “look”  and  “but” 
in  the  sentence  “People  look,  but  no  one  ever  finds  it”  For 
these  instances,  there  were  no  “silent  gaps”  in  the  acoustic 
signals.  As  perceptual  studies  demonstrated,  “pause”  can 
be  perceived  (via  vowel  prolongations  prior  to  a  phrase 


NCVS  Statu*  and  Pregrata  Raport*6S 


boundary,  for  example)  without  silent  gaps  in  the  acoustic 
signals.  It  should  also  be  noted  that  the  automatic  duration 
analyzer  treats  so-called  “filled  pauses"  or  hesitation  pauses 
with  overt  vocalizations  such  as  “uh ..."  or  “bumm ...”  as 
utterances.  In  oral  reading,  such  filled  pauses  rarely  occur, 
and  indeed  none  occurred  in  the  present  reading  samples. 

Consideration  of  these  issues  surrounding  “pause” 
suggests  that  the  use  of  agreement  between  the  automatic 
analyzer  and  perceptual  results  may  not  always  be  an 
appropriate  criterion  for  determining  the  thresholds  of 
pauses.  STR  values  were  found,  however,  to  be  quite 
insensitive  to  variations  of  amplitude  thresholds  in  contrast 
to  mean  utterance  and  pause  durations.  Furtherexperiences 
in  durational  analyses  will  hopefully  allow  determination 
of  the  amplitude  threshold  without  rigorous  perceptual 
experiments. 

Significance  of  STR  Measurements 

An  STR  value  of  .85,  for  example,  means  that  85% 
of  the  total  speaking  time  is  spent  for  actual  speaking  and 
15%forpauses.  As  stated  earlier,  such  information  is  useful 
in  delineating  general  characteristics  of  speech  produced  by 
individuals  with  various  speech  disorders.  More  specific 
merits  of  STR  measurements,  however,  lie  in  the  fact  that, 
in  oral  reading  tasks  at  least,  STR  serves  as  a  reasonable 
estimate  of  efficacy  of  the  speech  production  nu»rhanism« 
as  reflected  in  temporal  patterns  of  respiration.  In  an 
investigation  of  respiratory  airflow  associated  with  oral 
reading,  Horii  and  Cooke  (1978)  found  exhalatory  dura¬ 
tions  of  their  eight  adult  normal-speaking  subjects  to  be 
about  87%  of  the  total  respiratory  cycle  (exhalation  plus 
inhalation).  Other  investigators  reported  similar  values 
ranging  from  80%  to  90%  of  speech  breathing  of  normal- 
speaking  subjects  (Itoh,  1975;  Itoh& Horii,  1985;Itohetal., 
1982;  Till&  Alp,  1991). 

When  the  exhalatory  air  is  not  efficiently  used  in 
connected  speech,  utterance  durations  become  shorter  be¬ 
cause  the  air  reservoir  is  expended  more  quickly,  resulting 
in  smaller  percentages  of  exhalatory  durations.  The  per¬ 
centage  becomes  smaller  yet  if  the  speaker  attempts  to 
compensate  for  the  quicker  consumption  of  air  reservoir  by 
deeper  and  longer  inhalations.  Such  inefficient  usage  of  air 
reservoir  can  occur  at  the  laryngeal  level  and/or  supraglottic 
articulatory  levels  due  to  various  structural  and  neurologi¬ 
cal  problems.  Lower  than  normal  STR  values,  therefore, 
would  indicate  less  efficient  respiratory  functions  for  speech. 
Prior  to  or  concurrent  with  investigations  of  various  speech 
disorders,  further  normative  data  must  be  accumulated 
because  the  number  of  the  present  investigation,  i.e.,  20, 
was  admittedly  small,  and  the  age  group  was  limited  to 
young  adults.  The  effects  of  the  types  of  reading  texts,  the 
manner  of  eliciting  impromptu  speech,  and  the  sample  sizes 
warrant  further  investigation. 
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Abstract 

Impaired  vocal  fold  motion  may  result  from 
cricoarytenoid  joint  fixation,  bilateral  vocal  fold  paralysis, 
or  interarytenoid  scarring.  Traditional  surgical  techniques 
have  focused  on  lateralization  or  resecting  the  arytenoid  for 
airway  improvement  This  paper  discussed  three  cases  of 
bilateral  reduced  vocal  fold  motion  of  neurogenic  etiology 
treated  with  posterior  cricoid  grafting  to  cause  a  wider 
resting  position  of  the  vocal  folds  and  arytenoids.  Airway 
improvement  occurred  in  all.  Voice  results  have  been 
encouraging.  Advantages  of  this  procedures  are:  sym¬ 
metrical  vocal  folds,  no  vocal  fold  or  joint  scarring,  larynx 
remains  a  candidate  for  electrical  pacing  when  that  be¬ 
comes  available.  Acoustic  and  aerodynamic  voice  results 
presented.  Results  should  be  considered  preliminary. 

Introdoction 

Impaired  vocal  fold  mobility  may  result  from  a 
variety  of  disorders  including  cricoarytenoid  joint  fixation, 
neurologic  injury  resulting  in  vocal  fold  paralysis  or  paresis, 
and  interarytenoid  scarring.  As  has  been  recognized  by 
other  authors,  distinguishing  between  these  disorders  is 
sometimes  difficulteven  at  the  timeof  direct  laryngoscopy.1 
Distinction  is  important,  however,  as  the  best  treatment 
modality  for  each  situation  may  differ.  Bilateral  vocal  fold 
paralysis  in  the  pediatric  population,  for  example,  is  often 
transient  Rosin  etal.  recently  reported  that  3  of  19  patients 
(16%)  requiring  tracheotomy  for  bilateral  vocal  fold  pa- 
ralysishad  spontaneous  resolution  allowing  decannulation. 
Six  others  were  decannulated  following  ventriculoperitoneal 


shunts  for  hydrocephalus.  Another  seven,  however,  could 
not  be  decannulated  at  the  time  of  their  study.2 

Tracheotomy  for  airway  compromise  from  im¬ 
paired  vocal  fold  mobility  has  been  the  gold  standard  of 
treatment  Because  of  the  care  and  problems  associated 
with  tracheotomies,  many  surgeons  in  the  past  have  de¬ 
scribed  various  alternative  techniques  to  improve  the  glottic 
airway.  These  techniques  have  focused  on  adequacy  of  the 
airway  to  allow  decannulation  while  avoiding  aspiration 
and  maintaining  some  voice.  Various  authors  have  de¬ 
scribed  arytenoidectomy  or  partial  arytenoidectomy  for 
bilateral  impaired  vocal  fold  mobility,  mostly  in  adults.” 
Only  a  few  cases  of  arytenoidectomy  in  children  have  been 
reported.”  Dennis  and  Kashima  have  described  success 
with  endoscopic  C02  laser  partial  cordectomy  in  adults  with 
vocal  fold  impairment7  Kashima  also  discussed  bis  results 
with  C02  laser  transverse  cordotomy  for  bilateral  vocal  fold 
impairment'  Multiple  methods  of  lateral  fixation  of  the 
vocal  fold  have  had  varying  success,  again  mostly  in 
adults.”  Tucker  has  described  his  success  with  laryngeal 
reinnervation.10 

Interarytenoid  division  with  long-term  stenting 
has  been  described  for  impaired  vocal  fold  mobility  from 
posterior  glottic  stenosis.  Goodwin  etal.  reported  six  adult 
patients,  five  of  whom  were  tracheotomy  dependent,  treated 
with  posterior  glottic  scar  excision  through  a  midline 
thyrotomy  and  long-term  stenting  without  grafts.  All  were 
successfully  decannulated  with  normal  vocal  fold  mobility 
in  three  and  improved  mobility  in  the  others.  Two  patients 
retained  normal  voices  and  the  remaining  patients  had 
improved  voices,  as  judged  subjectively." 
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There  have  also  been  reports  of  children  with 
aboormal  vocal  fold  mobility  secondary  to  interarytenoid 
scarring  and  posterior  glottic  stenosis  who  have  been 
treated  with  posterior  aicoidotomy  and  cartilage  grafting. 
Zalzal  reported  11  children  treated  for  posterior  laryngeal 
stenosis  by  this  method,  five  of  whom  had  associated  vocal 
fold  impairment  All  fiw.  were «twannif tnt^  qna  had  rrpnn 
of  normal  vocal  fold  mobility.  Voice  was  evaluated 
subjectively  only,  with  all  voices  being  “husky  or  hoarse” 
and  two  of  the  five  being  worse  than  preopcraiively.12  In 
another  paper,  Zalzal  etal.  reported  vocal  quality  results  in 
16  children  decannulated  following  laryngeal  reconstruc¬ 
tion,  including  five  patients  treated  with  posterior 
cricoidotomy  aM  cartilage  grafting  for  abnormal  vocal  fold 
mobility  associated  with  stenosis.  Voice  evaluation  con¬ 
sisted  of  analysis  of  quality,  pitch,  volume,  resonance, 
speaking  rate,  intelligibility,  and  overall  voice  severity  as 
judged  by  speech  pathologists.  Twoofthefivewerejudged 
with  severe,  one  of  the  five  with  moderate,  and  two  of  the 
five  with  mild  overall  voice  disorder.  Most  had  either 
prominent  breathiness,  hoarseness,  or  low  volume.13  Cot¬ 
ton  reported  61  children  who  underwent  posterior  glottic 
cartilage  grafting  for  laryngotracheal  stenosis.  Forty-seven 
of  these  children  also  had  some  degree  of  bilateral  vocal 
fold  impairment  with  20%  being  fixed  bilaterally. 
Preoperative  and  postoperative  voice  analysis  was  re¬ 
stricted  to  subjective  analysis  by  parents.14 

This  paper  reviews  the  cases  of  three  children  with 
bilateral  impaired  vocal  fold  mobility  from  presumed  neu¬ 
rogenic  etiology  treated  with  an  extended  posterior  cricoid 
split  and  cartilage  grafting.  In  these  three  patients  there  was 
no  evidence  of  posterior  glottic  stenosis  or  interarytenoid 
scarring  as  a  cause  of  the  vocal  fold  immobility.  Tlie  goal 
was  to  accomplish  arytenoid  separation  and  a  larger  poste¬ 
rior  glottis,  while  maintaining  laryngeal  symmetry,  mobile 
arytenoids,  and  healthy  membranous  vocal  folds. 

Materials  and  Methods 

Subjects 

These  patients,  ranging  in  age  from  23  months  to 
7  years  underwent  posterior  cricoid  split  with  costal  carti¬ 
lage  grafting  for  bilateral  vocal  fold  impaired  mobility.  A 
summary  of  each  patient  is  found  in  Table  1. 

Preoperative  Evaluation 

Thorough  history  and  physical  examinations  were 
performed  on  all  patients.  Flexible  laryngoscopy  and,  in 
sane  instances,  flexible  bronchoscopy  were  used  for  dy¬ 
namic  airway  evaluation.  Rigid  laryngoscopy  and 
bronchoscopy  under  general  anesthesia  were  performed 
and  videotaped.  Patients  were  selected  that  had  no  symp¬ 
toms  of  aspiration  preoperatively  and  who  demonstrated 
adductory  motion  on  laryngoscopy.  Although  no  electro- 
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myography  was  performed,  clinically  these  children  dem¬ 
onstrated  poor  or  no  fold  abduction.  They  did  show 
adduction,  indicating  ability  to  dose  the  glottis. 

Surgical  Approach 

The  surgical  approach  to  the  posterior  cricoidotomy 
with  cartilage  graft  is  similar  to  that  previously  described  by 
Cotton  and  Zalzal.12-14  The  posterior  cricoidotomy  is  per¬ 
formed  and  carried  superiorly  slightly  into  the  interarytenoid 
muscle.  If  there  is  interarytenoid  scarring,  the  scar  tissue  is 
divided  but  not  excised.  The  uninvolved  interarytenoid 
muscle  is  left  intact  Some  mild  arytenoid  separation  will 
occur  by  dividing  the  posterior  cricoid;  greater  separation 
occurs  with  division  of  the  fibrous  network  immediately 
superior  to  the  cricoid.  Further  separation  may  be  accom¬ 
plished  with  progressive  (inferior  to  superior)  division  of 
the  interarytenoid  muscle.  However,  we  do  not  recommen  d 
complete  or  extensive  division  of  the  inter-arytenoid  muscle 
since  adequate  arytenoid  separation  occurs  without  it  and 
the  interarytenoid  muscle  is  a  likely  key  adductor  whose 
function  will  ideally  be  preserved.  Interarytenoid  muscle 
function  is  probably  an  important  factor  in  preventing 
postoperative  aspiration.  Surgical  injury  to  this  muscle 
should  be  minimized.  In  cases  of  impaired  mobility  due  to 
interarytenoid  scarring,  muscle  division  may  be  required. 

It  is  notnecessary  to  place  part  of  <be  cartilage  graft 
into  the  interarytenoid  area,  litis  superior  positioning  of  the 
graft  may  in  fact  impair  arytenoid  adduction,  leading  to 
aspiration.  Therefore,  the  cartilage  is  sutured  to  the  cricoid 
without  superior  cartilage  extension.  We  leave  a 
perichondrial  flap  extending  superiorly  to  cover  the  ex¬ 
posed  inferior  portion  of  the  interarytenoid  muscle,  al¬ 
though  this  may  not  be  necessary. 

A  laryngeal  stent  is  placed  from  above  the  level  of 
the  stoma  to  a  few  millimeters  above  the  false  folds.  The 
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stent  we  now  use  for  this  procedure  is  an  endotracheal  (ET) 
tube  which  has  been  halfway  crimped  horizontally  with  a 
towel  damp  and  then  autoclaved  for 30-45  seconds  so  that 
the  ET  tube  maintains  the  crimped  shape.  (Photos  1A-D; 
see  center-bound  photographic  plate).  (Although  this  tech¬ 
nique  was  taught  to  us  by  Dr.  Rodney  Lusk  in  1985, 1  have 
heard  that  others  have  independently  used  this  technique  as 
well.)  This  creates  a  stent  with  a  V  shape  anteriorly  to  lit 
in  the  anterior  commissure  and  a  rounded,  broad  shape 
posteriorly  to  fit  in  the  posterior  glottis.  The  stent  above  and 
below  the  crimp  is  round.  We  have  switched  to  this  crimped 
stent  for  this  particular  procedure  since  we  suspect  that 
rounded  stents  have  occasionally  caused  some  membra¬ 
nous  fold  compression  atrophy  and  abduction  of  the  aryte¬ 
noids  leading  to  some  dyspbonia  and  breatfainess  though 
this  has  not  been  confirmed  by  comparison  studies. 
Preoperative  and  postoperative  laryngeal  configurations 
show  that  the  posterior  glottis  is  open,  but  the  membranous 
folds  remain  more  medialized  (Photos  2A  and  B;  see  photo 
plate).  The  postoperative  configuration  is  due  to  two 
factors:  arytenoids  in  a  wider  resting  separated  state,  and 
arytenoids  which  are  still  medially  rotated  in  a  resting  or 
semi-adducted  position.  The  stent  is  secured  with  2-0 
proiene  sutured  through  the  cricoid  aid  tied  over  the  strap 
muscles  subcutaneously  and  left  for  approximately  two  - 
four  weeks. 

Voice  Analysis 

Voice  recordings  were  made  on  all  three  patients 
allowing  evaluation  of  mean  habitual  fundamental  fre¬ 
quency,  jitter,  and  overall  quality.  Patients  #1  and  #2  were 
cooperative  enough  to  allow  limited  aerodynamic  evalua¬ 
tion  and  more  detailed  voice  analysis  including  maximum 
pbonation  time,  IX  airflow,  mean  habitual  intensity,  and 
intensity  range.  Patient  #3  was  too  young  to  undergo 
additional  voice  or  aerodynamic  evaluation. 

Results 

Average  follow  up  time  was  23  months.  All  three 
patients  were  successfully  decannulated  with  time  to 
decannulation  ranging  from  13  days  to  three  months  post- 
operatively.  Decannulation  was  delayed  beyond  the  imme¬ 
diate  stent  removal  period  in  one  patient  (#2)  because  of 
postoperative  granulation  tissue.  This  patient  also  had 
persistent  preoperative  supraglottic  collapse  and  arytenoid 
prolapse  from  laryngomalacia,  which  further  delayed  her 
decannulation.  All  patients  continued  to  have  abnormal 
vocal  fold  mobility  postoperatively .  All  patients  are  able  to 
participate  in  normal  play  activity  with  adequate  airway 
function  according  to  their  parents.  Audible  breathing  is 
still  present  in  the  patient  with  laryngomalacia.  No  patient 
has  had  difficulty  with  aspiration. 

Voice  results  are  listed  in  Table  1.  Perceptual 
voice  quality  was  judged  by  our  speech  pathologist  on  a 


seven  point  scale  with  ooe  being  normal  and  seven  aphonic. 
Patient  #1  was  rated  as  normal  and  the  others  were  given  a 
rating  of  two  because  of  mild  breathiness.  All  patients  had 
age  appropriate  mean  fundamental  frequency.  Mean  jitter 
was  normal  in  patients  #  1  and  #2  but  increased  in  patient  #3 . 
Patients  #1  and  #2  underwent  additional  aerodynamic  and 
voice  analysis.  The  AC:DC  airflow  ratio  was  normal  in 
both  patients.  Signaknoise  ratio  was  normal  in  patient  #2 
and  just  slightly  low  in  patient  #1.  DC  airflow  was  within 
normal  limits  (using  adult  values)  for  patient  #2  but  prob¬ 
ably  represents  a  high  flow  rate  for  children.  EX  flow  was 
elevated  for  patient  *1.  This  indicates  a  high  flow  rate 
through  the  glottis  during  speech.  Nevertheless,  both 
patients’  ability  to  entrain  the  flow  into  vocal  pulses(AC:DC 
ratio)  was  adequate  enough  (ratio  was  1 .6  and  3.4)  that  they 
have  no  or  minimal  breathiness  and  normal  intensity. 

Discussion 

B  ilateral  vocal  fold  immobility  has  been  a  difficult 
problem  to  manage  clinically.  The  gold  standard  of  treat¬ 
ment  at  most  institutions  has  been  tracheotomy.  Many 
techniques  have  been  described  including  arytenoidectomy, 
vocal  fold  lateralization,  partial  cordectomy/cocdotomy 
and  laryngeal  reinnervation  to  avoid  tracheotomy  or  to 
allow  decannulation.  The  objective  of  treatment  not  only 
focuses  on  airway  adequacy  but  also  on  the  avoidance  of 
aspiration  and  on  the  preservation  or  improvement  of  voice. 
In  the  pediatric  population,  some  patients  are  found  who  do 
not  have  complete  bilateral  vocal  fold  paralysis,  but  rather 
demonstrate  impaired  abduction  of  both  vocal  folds.  These 
have  no  aspiration,  adequate  voice,  but  experience  signifi¬ 
cant  airway  obstruction. 

Our  clinical  experience  indicates  that  adduction 
generally  seems  to  be  a  stronger  component  than  abduction 
ofthe  vocal  folds.  In  unilateral  paralysis  it  is  quite  common 
to  see  over-  adduction  by  the  opposite  fold.  When  some 
adduction  is  present  preoperatively,  separating  the  aryte¬ 
noids  (instead  of  arytenoidectomy)  capitalizes  on  this  ad¬ 
ductor  motion  to  provide  glottic  closure.  This  may  explain 
in  part  why  patients  in  this  small  series  did  not  experience 
aspiration  postoperatively. 

Posterior  cricosdotomy  with  cartilage  grafting  and 
stenting  for  treatment  of  posterior  glottic  and  subglottic 
stenosis  has  had  good  success. iU4  Posterior  cricoidotomy 
with  cartilage  grafting  has  been  indicated  for  posterior 
glottic  and/or  sub-glottic  stenosis  and  total  glottic  or 
subglottic  obstruction.'2  We  propose  expansion  of  these 
indications  to  include  selected  patients  with  bilateral  im¬ 
paired  vocal  fold  mobility  secondary  to  abductor  paralysis 
or  paresis  with  some  residual  adductory  motion. 

Selection  criteria  for  this  procedure  would  include 
some  residual  adduction  of  the  vocal  folds,  presence  of 
laryngeal  sensation  and  intact  swallowing  function  as  aspi- 
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ration  can  be  a  serious  complication  if  careful  selection  of 
patients  is  not  done.  Certainly,  if  there  is  evidence  of 
aspiration  preopetatively,  any  arytenoid  separation  would 
not  be  indicated.  We  are  encouraged  by  tbe  airway  and 
voice  results  obtained  by  using  ibis  procedure.  Our  subject 
numbers  are  few  and  tbe  results  are  considered  preliminary . 
No  comparison  with  other  methods  was  performed.  Al¬ 
though  this  procedure  is  not  recommended  for  all  patients 
who  have  bilateral  vocal  fold  paralysis  or  cricoarytenoid 
joint  fixation,  we  feel  that  it  does  offer  advantages  to  those 
who  have  residual  adductor  vocal  fold  motion.  This 
procedure  is  not  felt  to  be  more  advantageous  compared  to 
standard  arytenokfectomy  or  lateralization  procedures  in 
patients  with  no  vocal  fold  mobility. 


Table  2. 

AdvaatageWDuadvaoUge*  at  Arytenoid  Separation 
Compared  to  Aryteaoidectomy 

Mvennaw 

-  Pr«a«rv«a  any  rnsidual  vocal  fold  motion 
•  No  masabranous  vocal  fold  scarring 

-  Trnata  concurrent  interarytenoid  acar  or  posterior 

subglottic  stenosis 

-  Laryngeal  aynmetry  maintained 

-  Preserves  vocal  structures  for  future  rehabilitation 

developments,  i.e.  laryngeal  pacing  or  reinnervation 

-  Tracheotomy  required 

-  Open  procedure  -  longer  operating  time,  possibility  of 

anterior  cotmtissure  misalignment 

-  Laryngeal  stenting  required 

-  Donor  site  Morbidity 
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Photo  I  A)  Edna  nonperforating  towel  clamp. 


Photo  IB)  Application  of  clamp  to  curved  portion  of  endotracheal  tube, 
clamp  it  applied  to  half  of  endotracheal  tuba  to  that  anterior  half  it  tightly 
together  and  poeterior  half  ie  rounded.  Sunt  and  damp  are  then  placed 
into  autoclave  for  45  second*.  Clamp  is  then  removed. 


Photo  ID)  Notice  that  the  internal  view  of  the  stent  shows  a  tear  drop 
configuration  to  the  part  that  will  be  positioned  at  the  glottis.  Stent  will  be 
placed  so  that  membranous  cords  fit  into  tightly  damped  area  while 
rounded  portion  fits  into  posterior  glottis. 


Photo  1C)  Photos  of  stent  after  autoclave  and  damp  removed. 


Photo  2A)  Digitized  image*  captured  and  imported  from  videotape  of 
normal  abduction  and  adduction  of  the  human  larynx.  Notice  that  during 
normal  abduction  the  vocal  procexses  rotate  outward  and  the  entire 
arytenoid*  slightly  separate frorm  each  other.  Bath  action*  provide  a  wide 
glottic  airway. 


Photo  2B)  Digitized  image*  captured  from  videotape  of  the  larynx  before 
tmd  after  arytenoid  reparation  turgery  a*  detcribed  m  the pcper.  Note  that 
the  body  of  the  arytenoids  are  further  separated  but  the  arytenoids  are  still 
m  a  relatively  adducted  petition,  tf  adductory  motion  i*  ttill  tamewhat 
intact  and  the  interarytenoid  muicle  function*  then  the  arytenoid  bodie* 
will  ttill  approximate  during  adduction. 


Photo  3)  Hittological  picture  demonstrating  the  difference t  in  density 
af fibroblasts  in  the  lamina  propria  of  the  vocal  fold  mucosa  of  an  adult 
(photo  A,  at  l^t)  and  a  4-year-old  child  (photo  B,  at  right).  Hematoxylin - 
Eoeinstain.  (FramHinmoM,  KuriiaS,  NakashimaT: Growth, development, 
and  aging  of  the  human  vocal  folds,  in  Bless,  Diane  M;  Abb*,  Janet  H 
(edt.):  Vocal  Fold  Physiology:  Coo  temporary  Research  and  Clinical 
Issues,  San  Diego,  California,  College-Hill  Press,  1983,  page  36.  Used  by 
permission). 


Photo  4)  Goal  larynx  specimen  after  3  month  placement  of  Cotton/Lorenz 
stent  i :  larynx.  Section  through  vocal  process  cfarytenoid  shows  erosion 
of  cartilage.  ( Hemataxylin-Eotin  stain,  30X). 
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Introduction 

Adduction  of  the  larynx  at  the  level  of  the  vocal 
folds  and  arytenoid  cartilages  is  a  primary  peripheral  and 
mechanical  control  variable  in  phonation.  Laryngeal  quali¬ 
ties  from  breathy  to  constricted  phonation  are  dependent  on 
glottal  adduction  (e.g.,  Scherer,  Gould,  Titze,  Meyers  and 
Sataloff,  1988b).  A  nonin vasive  measure  of  laryngeal 
adduction  is  therefore  of  importance  and  interest  for  both 
theoretical  and  applied  purposes. 

The  electroglottograph  is  a  nonin  vasive  instru¬ 
ment  that  provides  a  signal  related  to  glottal  kinematics  (ref. 
Baken,  1987  and  1992,  Colton  &  Conture,  1990,  and 
Orlikoff,  1991,  for  a  relatively  thorough  review  of  prin¬ 
ciples,  history,  pitfalls,  and  relationships  to  laryngeal  func¬ 
tion).  Values  of  the  electroglottograph  (EGG)  waveform 
function  correspond  strongly  to  the  amount  of  contact  area 
between  the  two  vocal  folds,  but  not  in  ways  completely 
understood  or  straightforward  (Childers  &  Krishnamurthy, 
1985;  Childers,  Alsaka,  Hicks  &  Moore,  1987;  Anastaplo 
&Karnell,  1988;  Scherer,  Druker&  Titze,  1988a;  Childers, 
Hicks,  Moore,  Eskenazi  &  Lalwani,  1990;  Titze,  1990). 
The  shape  of  the  EGG  waveform  may  be  related  to  specific 
configurations  and  motions  of  the  vocal  folds  relevant  to 
normal,  abnormal,  and  trained  voices  (e.g.,  Fourcin,  1974; 
Titze,  1984, 1989;  Dejonckere  &  Lebacq,  1985;  Childers, 
Alsaka,  Hicks  &  Moore,  1986;  Baken,  1987;  Scherer  & 
Titze,  1987;  Gerratt,  Hanson  &  Berke,  1987;  Painter,  1988; 
Motta,  Cesari,  Iengo  &  Motta,  1990;  Brown  &  Scherer, 
1992). 

This  study  examines  a  simple  measure  of  the 
waveform  of  the  electroglottograph,  called  the  EGGW 
measure,  to  determine  its  relationship  to  other  measures  of 
adduction,  including  a  direct  measure  of  the  gap  between 
the  vocal  processes  of  the  arytenoid  cartilages.  The  results 


will  show  that  EGGW  appears  to  be  a  significant  measure 
of  adduction,  at  least  for  the  limited  number  of  subjects  and 
phonatory  conditions  reported  in  this  study. 

Definition  of  the  Measure  EGGW 

Figure  1  illustrates  an  electroglottographic  wave¬ 
form  during  normal  phonation  and  the  definition  of  the 
simple  measure  EGGW  (the  W  stands  for  width).  At  the 
25%  level  (Orlikoff,  1991;  cf.  Rothenberg  &  Mabshie, 
1988,  who  used  35%,  and  Higgins  &  Saxman,  1993,  who 
used  40%),  the  distance  A  on  Fig.  1  corresponds  to  an 
approximate  time  of  glottal  closure,  and  distance  B  to  an 
approximate  time  of  glottal  opening.  The  ratio  of  A  to  A+B 
is  an  estimate  of  the  portion  of  the  cycle  the  glottis  is  closed. 
EGGW  is  equal  to  A  divided  by  A+ B,  and  could  be  called 
a  glottal  closed  quotienL  EGGW  is  obtained  for  each  cycle 


EGGW  =  A/ (A+B) 


Figure  1.  Definition  of  the  EGGW  measure  taken  at  the  25%  height 
location  on  the  electroglottograph  waveform.  The  upper  portion  of  the 
waveform  corresponds  to  maximum  glottal  closure  (or  maximum  glottal 
contact  area),  and  the  lowest  portion  of  the  waveform  to  maximum  glottal 
opening  (or  minimum  glottal  contact  area). 
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of  phonation.  The  type  of  electroglotlograpfa  used  through¬ 
out  this  study  was  the  Research  Laryngograph  produced  by 
Dale  Teaney  (1987). 

Comparison  Measures 

Titze  (1984)  suggested  the  Abduction  Quotient, 
Qa,  a  ratio  of  the  width  of  the  glottis  essentially  between  the 
vocal  processes  to  twice  the  amplitude  of  motion  of  a  vocal 
fold.  Qa  is  one  of  a  number  of  measures  given  by  Titze 
(1984)  obtained  from  a  theoretical  approach  to  the  mechan¬ 
ics  of  motion  of  the  vocal  folds.  The  Qa  measure  was 
obtained  from  the  software  analysis  and  synthesis  program 
GLIMPES.  Qa  tends  to  decrease  as  vocal  quality  changes 
from  breathy  to  normal  to  pressed  (Scherer  et  al.,  1988b). 
Values  of  Qa  above  approximately  O.S  are  associated  with 
hypoadductioo,  and  below  - 1 .0  with  hyperadduction  (Scherer 
etal.,  1988b). 

The  derivative  of  the  EGG  signal  may  give  promi¬ 
nent  positive  and  negative  peaks  that  can  be  used  to 
approximate  the  glottal  open  quotient  (e.g.,  Childers  et  al., 
1990).  As  Figure  2  helps  to  illustrate,  the  positive  and 
negative  peaks  of  the  EGG  derivative  refer  to  locations  near 
glottal  closure  and  opening,  respectively,  during  which  the 
EGG  signal  changes  (increases  and  decreases,  respectively) 
the  fastest.  The  distance  B  of  Figure  2  divided  by  A+B  is 
a  quotient,  designated  Qodegg,  and  is  an  approximation  to 
the  glottal  open  quotient 

If  the  glottis  is  viewed  with  stroboscopy  and 
recorded  onto  video  tape,  the  glottal  open  quotient  can  be 
estimated  by  noting  the  number  of  frames  the  glottis 
appears  to  be  partly  to  most  fully  open  to  the  total  number 
of  frames  for  the  entire  glottal  pbonatory  cycle,  giving 
Qostrb. 


A  B 


Time  — 3»- 


Figure  Z  Definition  of  the  glottal  open  quotient  obtained  by  using  the 
differentiated  EGG  waveform.  The  upper  trace  is  thedifferentiation  of  the 
lower  EGG  waveform.  Markers  on  the  upper  trace  are  taken  at  the 
maximum  and  minimum  values  of  the  differentiated  EGG  waveform. 


Relatiooship  Between  EGGW  and  Qa 

Seven  normal  adult  community  actors  (4  males 
and  3  females,  age  range  of  23-35  years,  and  no  reported 
history  of  nontransient  vocal  problems),  were  asked  to 
produce  three  prolonged  /a/  vowels  in  a  steady  manner  a 
comfortable  pitch  and  loudness  levels  (and  equal  effort 
levels)  for  each  of  the  vocal  qualities  breathy,  normal  and 
pressed  (or  constricted).  The  middle  of  each  EGG  recording 
was  digitized  for  one  second  by  a  16  (effective  15)  bit 
analog  to  digital  system  (Digital  Sound  Corporation  200 
Audio  Data  Conversion  System)  at  20,000  samples  per 
second,  and  stored  in  a  VAX  11/750  computer.  Analyses 
were  performed  on  three  consecutive  cycles  near  the 


7  Actors 


Abd  Quotient  (Qo) 


BB  and  RS 


Abd  Quotient  (Qa) 


Figure  3  (top).  Relationship  between  the  abduction  quotient  (Title,  1984 ) 
and  the  EGGW  measure  taken  at  the  25%  height  of  the  EGG  waveform  for 
7  community  acton  with  normal  voices.  The  subjects  prolonged  the  vowel 
/a/ over  a  wide  range  of  intended  voice  qualities  from  very  breathy  to  very 
pressed  or  constricted.  The  vertical  dashed  lines  mark  expected  regions 
of  hyperadduction  and  hypoadduction  determined  from  Scherer  et  al., 
1988b.  Figure  4  (bottom).  Relationship  between  the  abduction  quotient 
and  the  EGGW  measure  for  two  normal  male  subjects,  BB  and  RS,  as  a 
replication  of  the  study  shown  in  Fig.  3. 
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beginning,  middle  and  end  of  each  utterance.  About  25%  of 
the  (Qa  only)  data  were  discarded  due  to  the  inability  of 
GUMPES  to  successfully  run  (Scherer  et  al.,  1988b;  also 
ref.  Scherer  &  Titze,  1987). 

Figure  3  shows  the  data  comparing  EGG  W  and  Qa 
fit  to  a  cubic  equation  with  an  R1  of 0.877.  This  suggests  a 
reasonably  strong  relationship  between  these  two  variables. 
On  a  retest  using  a  professional  tenor  (BB)  and  a  non- 
professional  bass-baritone  (RS),  again  using  comfortable 
pitch  for  sustained  /a/  but  over  a  wide  range  of  adduction 
intentions,  the  relationship  shown  in  Figure  3  was  supported 
as  shown  in  Figure  4  (average  difference  of  11.9%,  sd  ■ 
18.6%,  between  the  data  and  the  cubic  fit  of  Figure  3). 

The  vertical  dashed  lines  on  Fig.  3  and  Fig.  4 

representapproximate  markers  for  perceived  hyperadduction 
andhypoadduction(Schereretal.,  1988b). The  correspond¬ 
ing  range  between  these  conditions  (the  “normal”  range) 
involves  values  of  EGGW  between  0.4  and  0.6.  This  would 
suggest  that  for  normal  larynxes,  values  of  EGGW  below 
0.4  may  correspond  to  the  perceptual  label  of  hypoadduction 
and  above  0.6  to  hyperadduction. 

Relationship  Among  EGGW25,  EGGW50,  EGGW75 
andQa 

The  measure  EGGW  discussed  above  (and  through¬ 
out  this  report)  was  taken  at  the  25%  amplitude  location 
from  the  baseline  of  the  EGG  waveform.  The  measure  can 
be  taken  at  any  reasonable  height  location,  however.  Using 
the  data  associated  with  Figure  3,  Table  1  shows  the 
correlations  among  EGGW  taken  at  the  25,  50  and  75% 
height  levels,  and  Qa.  The  correlations  are  reasonably  high 
(minimum  3  0.71 1).  EGGW  at  the  50%  level  may  be  the 
measure  of  choice  when  there  appears  to  be  too  much  noise 
or  waveform  distortion  on  the  lower  portions  (open  glottis 
region)  of  the  EGG  waveform.  A  reasonable  correspon¬ 
dence  between  the  EGGW50  and  EGGW25  values  is  y  = 
1.067x  +  0.081,  where  x  is  the  EGGW50  value  and  y  is  the 
associated  predicted  EGGW25  value.  The  highest  correla¬ 
tion  is  between  the  measures  EGGW50  and  EGGW75, 
suggesting  that  these  measures  are  essentially  redundant 
The  table  also  indicates  that  EGGW25  is  the  most  highly 
correlated  measure  with  Qa  (r = -0.9 12)  of  the  three  EGGW 
measures. 


Tablet. 

Pearson  product  moment  correlation  values  for  EGGW 
and  Qa  measures.  N  for  Qa  correlations  is  147,  and  189 
for  the  other  correlations. 


EGGW50 

EGGW75 

0* 

EGGW2S 

0.820 

0.711 

0.912 

EGGW50 

0.966 

-0.854 

BGGW75 

-0.715 

Subject:  BB 


Figures.  Relationship  between  the  EGGW  meawrrand  the  glottal  open 
quotient  obtained  by  countingfranes  of  stroboscopic  images  of  the  larynx 
of  subject  BB. 


Relationship  Between  EGGW  and  Qostrb 

A  Wolf  stroboscopic  system  was  used  with  subject 
BB  to  determine  the  open  quotient  of  the  larynx  by  counting 
the  number  of  frames  during  which  the  glottis  was  partly  to 
fully  open,  and  the  number  of  frames  for  the  entire  cycle. 
EGG  was  recorded  simultaneously.  A  comfortable  pitch 
and  loudness  was  used.  Figure  5  shows  the  data  for  corre¬ 
sponding  Qostrob  and  EGGW  measures  (the  quantity  1- 
Qostrb,  the  equivalent  to  a  closed  quotient,  is  used  in  the 
figure).  The  figure  suggests  that,  for  a  wide  range  of 
adduction,  video  frame  counting  and  the  EGGW  measure 
not  only  are  strongly  related  (r=0.93),  but  the  values  of  1- 
Qostrb  and  EGGW  at  the  25%  level  give  nearly  the  same 
value  if  the  corresponding  linear  fit  line  is  considered.  The 
relation  between  EGGW  and  1 -Qostrb  is  given  by  1  -  Qostrb 
=  0.999EGGW  +  0.015  in  Fig.  5. 

Relationship  Between  EGGW  and  Qodegg 

EGGW  obtained  from  the  non-differcn  dated  EGG 
signal,  and  the  open  quotient  from  the  differentiated  EGG 
signal,  were  obtained  for  both  subjects  BB  and  RS.  The 
results  are  shown  in  Figure  6.  The  figure  indicates  that  the 
EGGW  measure  yielded  values  larger  than  did  1 -Qodegg, 
but  with  a  strong  relationship  (r=0.982  for  the  two  subjects 
combined).  There  appears  to  be  a  greater  difference  be¬ 
tween  values  for  EGGW  and  Qodegg  as  adduction  in¬ 
creases.  EGGW  is  related  to  1  -Qodegg  in  Fig.  6  by  the  linear 
equation  1-Qodegg  =  0.902EGGW  -  0.0139. 
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EGGW 

Figure  6 .  Relationship  between  the  EGGW  manure  and  the  glottal  open 
quotient  obtained  from  tit  differentiated  EGG  waveform  for  subjects  BB 
and  RS. 


Figure  7.  Relationship  between  tit  mattings  for  EGGW  on  tit  EGG 
waveform,  and  tit  markings  for  the  closed  quotient  on  the  derivative  of  tit 
EGG  waveform.  The  marianp  on  the  EGG  derivative  (giving  distance  dl) 
are  closer  together  than  are  those  for  tit  25%  cut  for  EGGW  (giving 
distance  d2),  showing  the  reason  for  tit  larger  EGGW  measures  of  glottal 
closed  quotient  in  Fig.  6. 


Figure  7  helps  to  explain  the  discrepancy  between 
EGGW  and  Qodegg.  The  peaks  of  the  derivative  of  the 
EGG  waveform  tend  to  fall  in  a  narrower  range  (dl)  than  the 
25%  height  markings  (d2)  for  the  EGGW  measure.  As 
adduction  decreases,  the  discrepancy  may  decrease,  as 
suggested  by  Fig.  6. 


S3 


Figures.  Tracing!  oflaryngeai  imager  for  subjects  RSandBB  indirating 
tit  various  measures.  VPG  u  vocal  process  g^;  lAGmjntrarytenoid  gap; 
LAW  »  oblique  width  of  the  ifft  cuneiform  cartilage  of  subject  RS;  RAW* 
oblique  width  of  tit  right  cuneiform  cartilage  of  subject  BB;  POGL  » 
posterior  glottal  length;  FVFG  *  false  vocal  fold  gap. 

Relationship  Between  EGGW  and  Glottal  Distance 
Measures 

The  larynx  of  subject  RS  was  viewed  using  a  Wolf 
rigid  laryngoscopic  system  and  recorded  onto  video  tape 
during  a  wide  range  of  adduction  conditions  while  produc¬ 
ing  an  open  hypopharynx  vowel  at  comfortable  pitch  and 
loudness.  Under  visual  observation  and  video  recording,  a 
length  of  cleaned  soldering  wire  with  a  turned  tip  was 
passed  through  the  vocal  tract  airway.  The  end  was  placed 
on  top  of  the  left  cuneiform  cartilage.  This  permitted  the 
estimation  of  the  superior  oblique  width  of  the  left  cunei¬ 
form  cartilage  (LAW  as  shown  in  Fig.  8),  and  thus  also  an 
approximation  of  the  width  of  the  gap  between  the  vocal 
processes  (VPG).  The  vocal  process  gap  and  the  width  of  the 
oblique  diameter  of  the  left  cuneiform  cartilage  were 
measured  directly  on  the  video  monitor.  Measurements 
were  made  by  two  people,  and  the  calculated  maximum 
error  expected  for  measurements  of  the  vocal  process  gap 
in  cm  was  +/-  13.4%.  This  error  was  calculated  using 
measurement  variabilities  for  the  VPG  monitor  measure¬ 
ment,  the  LAW  monitor  measurement,  the  estimate  of  the 
actual  LAW  measure  determined  from  the  width  of  the  tip 
of  the  soldering  iron  (including  the  variability  for  the  solder 
width  measurement),  and  the  height  discrepancy  between 
the  level  of  the  true  vocal  folds  and  the  top  of  the  cuneiform 
cartilage  (assumed  to  be  1  cm).  Figure  8  is  a  tracing  and 
schematic  of  the  glottal  measures  for  subjects  RS  and  BB, 
respectively. 

Figure  9  shows  the  relationship  between  EGGW 
and  the  vocal  process  gap  VPG  for  subject  RS.  The  figure 
strongly  suggests  a  reduction  in  the  space  between  the  vocal 
processes  (greater  adduction)  as  EGGW  increases.  The 
data  suggest  that  the  vocal  processes  touch  when  EGGW  is 
between  0.60  and  0.65.  For  the  nonlinear  relationship 
shown  in  Figure  9,  VPG  (cm)  a  1.2Q5EGGW2-  1.571EGGW 
+  0.511,  for  0.2  EGGW  0.65.  The  data  suggest,  for 
example,  that  for  a  vocal  process  gap  of  0.1  cm,  EGGW 
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Subject:  RS 


0.15 


Subject:  BB 


Subject:  RS 


Subject:  RS 


Figure  9  (upper  left)  Relabonthip  between  the  vocal  process  gap  VPG  and  EGGW  for  subject  RS.  Figure  10  (upper  right).  Relationship  between  VPG/ 
RA  W  and  EGGW for  subject  BB.  Figure  1 1  (lower  left).  Relationship  between  the  intemrytenoid  gap  1AG  and  EGGW  for  subject  RS.  Figure  12  (lower 
right).  Relationship  between  the  false  vocal  fold  gap  FVFG  and  EGCWfor  subject  RS. 


equals  approximately  0.36  for  this  subject  It  is  also  noted 
that  a  value  of  EGGW =0.6,  the  value  near  which  the  vocal 
processes  touch,  corresponds  to  the  perceptual  boundary  of 
hyperadduction  as  discussed  above  for  Fig.  3. 

A  similar  experiment  was  performed  with  subject 
BB,  although  without  absolute  measures  of  the  vocal 
process  gap.  The  larynx  ofsubjectBB  was  video  taped  with 
the  Wolf  system  and  laryngeal  images  seen  on  the  video 
monitor  were  copied  to  a  Tektronix  4632  hard  copy  unit 
The  VPG  measure  was  made  at  the  visually  consistent 
region  where  the  viewed  right  cuneiform  border  intersected 
the  vocal  process  border  (ref.  Fig.  8).  The  value  of  the  VPG 
was  normalized  by  the  oblique  diameter  of  the  right  cunei¬ 
form  cartilage  (VPG/RAW).  Actual  gap  values  were  not 
obtained. 


Figure  10  illustrates  that  the  relationship  between 
EGGW  and  VPG/RAW  for  subject  BB  appears  linear. 
Again  the  data  suggest  that  the  vocal  processes  touch  when 
the  EGGW  value  is  between  0.60  and  0.65  (the  best  fit  line 
suggests  0.64,  whereas  there  is  (me  data  point  near  VPG=0 
at  approximately  0.575). 

After  the  vocal  processes  touch,  greater  adductory 
forces  can  approximate  the  arytenoid  cartilages  further.  For 
subject  RS,  the  medial  boundaries  of  the  arytenoids 
(cartilagenous  glottis)  were  viewable.  The  interarytenoidal 
gap  GAG,  ref.  Fig.  8)  was  approximated  by  measuring  the 
videomonitor  distance  between  the  bilateral  supero-medial 
arytenoid  cartilage  eminences,  and  normalized  by  the  left 
oblique  cuneiform  diameter.  Figure  11  shows  the  LAG 
measure  (maximum  measurement  error  of  +- 1 1.2%)  versus 
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Figure  13.  RekUiomhif  betwtm  FVFG/RAW  amd  EGGW  for  sub ject  BA 


Subject:  RS 


Figure  14.  Relationship  between  the  portrnor  glottal  length  POGLversus 
EGGW  far  subject  RS. 


(lie  EGGW  measure  for  a  wide  range  of  adduction.  This 
figure  strongly  suggests  a  change  in  the  relationship  near  a 
value  of  EGGW  equal  to  0.6S,  the  approximate  value 
corresponding  to  a  vocal  process  gap  of  zero.  With  greater 
adduction,  LAG  decreases  rapidly  as  EGGW  increases 
slowly.  Near  the  lowest  values  of  I  AG,  the  scatter  of  EGGW 
is  relatively  high.  The  data  suggest  that  values  of  EGGW 
greater  than  0.65  correspond  to  forceful  adduction  of  the 
arytenoid  cartilages.  Smaller  IAG  values  and  larger  EGGW 
values  greater  than  0.65  suggest  effective  compression  at 
the  vocal  processes  and  “closer”  vocal  folds.  The  scatter  of 
EGGW  data  corresponding  to  values  of  IAG  below  about 
0.3  cm  suggests  adjustments  of  the  thyroarytenoid  muscles, 
interarytenoid  muscles,  and  perhaps  subglottal  pressure 
resulting  in  a  variety  of  widths  of  the  closed  glottis  portion 
of  the  EGG  waveform.  It  is  noted  that  the  reported  distances 
for  the  IAG  measure  may  be  unique  to  subject  RS  due  to 


Figure  13.  ReUuiautup  betwem  POCL/MAW  venue  EGGW  for  subject 
BB. 


individual  differences  of  the  structure  of  the  arytenoid 
cartilages  and  adductory  function  across  individuals. 

Data  shown  in  Fig.  1 1  (and  Fig.  6)  suggest  that  the 
total  range  of  expected  EGGW  values  may  be  0.15  to  0.80. 

Also  examined  were  the  distance  changes  between 
the  ventricular  folds  (FVFG,  the  false  vocal  fold  gap)  and 
the  anterior-posterior  distance  of  the  viewable  cartilagi¬ 
nous  glottis  (POGL,  the  posterior  glottal  length)  as  defined 
inFig.  8.  Figure  12  shows  the  data  for  FVFG  versus  EGGW 
for  subject  RS  (the  estimated  maximum  error  for  the  FVFG 
measures  for  RS  was  +-11.1%).  Although  there  is  some 
scatter  of  data,  there  was  apparently  little  change  in  the 
distance  between  the  medial  edges  of  the  ventricular  folds 
until  the  EGGW  values  of  adduction  reached  approxi¬ 
mately  0.65,  consistent  with  the  IAG  measure,  beyond 
which  there  was  a  sharp  change  in  the  measure  correspond¬ 
ing  to  the  inferred  hyperadduction.  Figure  13  shows  the 
corresponding  measure  FVFG/RAW  for  subject  BB.  Here 
the  data  suggest  that  the  distance  between  the  false  vocal 
folds  begins  to  decrease  at  a  value  of  EGGW  of  approxi¬ 
mately  0.41,  a  smaller  number  than  for  subject  RS.  A 
decrease  in  the  FVFG  measure  may  suggest  greater  contrac¬ 
tion  of  the  superior  portions  of  the  thyroarytenoid  muscle 
lateral  to  the  ventricular  folds. 

Data  for  the  posterior  glottal  length  POGL  (ref. 
Fig.  8)  for  subject  RS  is  given  in  Figure  14  (estimated 
maximum  error  for  POGL  data  was  +-  10.8%).  POGL 
values  linearly  decreased  as  adduction  (EGGW)  increased 
until  (once  again)  about  the  0.65  value,  beyond  which  the 
values  dropped  sharply.  The  distance  decreased  as  a  result 
of  the  bunching  of  the  soft  tissue  on  the  posterior  wall  and 
greater  posterior  touch  of  the  medial  arytenoid  surfaces  at 
and  posterior  to  the  vocal  processes.  The  POGL/RAW 
values  versus  EGGW  for  subject  BB,  Figure  15,  show  a 
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«imiiar  trend  as  for  RS,  that  is,  a  relatively  linear  relation¬ 
ship  of  decrease  of  the  posterior  glottal  length  with  increas¬ 
ing  adduction  over  the  same  range  of  EGGW  values. 

Rdatioosbip  to  Theoretical  Adduction  Measures 

Under  the  useful  assumption  chat  vocal  fold  tissue 
moves  in  a  sinusoidal  manner  or  that  glottal  area  can  be 
modelled  by  a  truncated  sinusoid,  both  Titze  (1988)  and 
Rothenberg  and  Mahshie  (1988)  describe  the  abduction 
quotient  Qa  and  abduction  measure  D,  respectively,  with 
respect  to  a  diagram  similar  to  Figure  16.  Tissue  movement 
or  glottal  area  is  represented  by  the  sinusoidal  waveform, 
tissue  contact  by  the  baseline  value  zero,  the  distance  of  the 
vocal  process  of  one  arytenoid  from  the  midline  by  W/2,  and 
the  amplitude  of  motion  of  the  vocal  fold  by  A. 

Relative  to  Fig.  16,  Titze’s  abduction  quotient  is 
given  by  Qa  «  W/(2A)  *  -cos(kQo),  where  Qo  *  To/T,  To 
is  the  time  the  glottis  is  open,  and  T  is  the  period  of  the  cycle. 
Rearranging  this  statement  yields  1  -  Qo  =  1  -  (l/n)cos'‘(- 
Qa)  (Equation  1],  Figure  17  shows  this  nonlinear  relation¬ 
ship  between  Qa  and  1  -Qo.  Rothenberg  and  Mahshie  (1988) 
define  their  abduction  measure  D  a  (1/2)(1  -  cos(rQo)) 
which  also  equals  (1/2X1+Qa).  Using  the  first  expression 
for  D  with  appropriate  substitution  of  Qa  leads  to  Equation 
1.  A  more  direct  comparison  of  D  and  Qa  is  1-D  =  (1/2)(1- 
Qa),  which  is  also  graphed  in  Fig.  17.  It  is  shown  that 
abduction  quotient  Qa  values  obtained  by  applying 
GLIMPES  (Titze,  1984)  to  EGG  recordings  from  humans 
(ref.  Fig.  3)  range  beyond  the  theoretically  expected;  theo¬ 
retical  values  of  Qa  range  from  -1  to  +1,  whereas  human 
data  values  are  permitted  to  range  from  about  -1.5  to  +1.5 
as  shown  in  Fig.  3  [later  application  of  Qa  by  Titze  (1990) 
show  wider  ranges  of  Qa  than  in  the  1984  paper].  The  form 
of  the  theoretical  and  actual  data  curves  is  not  dissimilar  in 
shape,  however. 

An  abduction  quotient  of  Qa=0  would  imply  that 
the  vocal  processes  just  touch.  Figures  18a  and  18b  (see 
next  page)  for  the  data  for  subjects  RS  and  BB,  respec¬ 
tively,  suggest  that  the  vocal  processes  were  still  separated 

Vocal  Fold 


Figure  16.  Sinusoidal  representation  of  vocal  fold  movement.  A  is  the 
amplitude  of  motion  of  one  vocal  fold.  The  dotted  ten  line  represents  the 
medial  glottal  cloture  location.  W/2  represents  half  of  the  prephonatory 
glottal  width  a  the  vocal  processes.  To  is  the  time  the  glottis  is  open.  The 
figure  is  after  Title  (1988)  and  Rothenberg  A  Mahshie  (1988). 


when  GLIMPES  gave  a  value  of  Qa=0.  For  subject  RS,  the 
vocal  process  gap  was  approximately  0.04  to  0.06  cm  when 
Q»>0. 

Dbcnsaon  and  Conclusions 

A  reliable  and  straightforward  measure  of  glottal 
adduction  is  required  for  the  clinical  and  training  need  of 
evaluating  and  establishing  adeqiatrphonatory  sound  within 
a  wide  variety  of  communication  requirements,  and  for  the 
determination  of  the  most  efficient  glottal  configuration 
from  an  acoustic  and  physiological  basis  (e.g.,  Titze,  1988; 
also  ref.  Scherer,  1991).  This  study  examined  the  simple 
glottal  adduction  measure,  EGGW.  It  is  derived  from  the 
electrogiottographic  waveform  by  taking  a  ratio  of  dis¬ 
tances  (or  times)  obtained  by  an  intersection  line  through 
the  signal  waveform  at  the  25%  height  location. 

EGGW  was  shown  to  be  strongly  related  (via  a 
cubic  equation)  to  Titze’s  (1984)  abduction  quotient  Qa 
(which  had  been  related  to  visual  judgments  of  adduction  in 
Scherer  etal.,  1988b).  EGGW  also  was  shown  to  be  strongly 
related  to  (and  nearly  equal  to)  measures  of  the  glottal 
closed  quotient  (that  is,  one  minus  the  value  of  the  glottal 
open  quotient,  1-Qo)  using  frame  counting  from  strobo¬ 
scopic  views,  although  data  were  not  extensive.  Values  of 
EGGW  were  greater  than  1-Qo  obtained  by  the  EGG 
derivative  method,  but  in  a  consistent  manner.  Since  the 
EGG  derivative  method  is  troublesome  for  EGG  wave¬ 
forms  without  clear  derivative  peaks,  EGGW  may  be  a 
more  reliable  method. 

The  most  significant  result  of  this  study  may  be  the 
relationship  between  EGGW  and  the  actual  distance  be- 


Abd  Quotient  (Qo) 


Figure  17.  Relationship  between  human  subject  data  and  theoretically 
derived  functions.  The  function  between  the  abduction  quotient  Qa  and 
EGGW  is  empirical  (Fig.  3).  The  curves  relating  1-Qo  (one  minus  the  open 
quotient)  and  1-D  (one  minus  the  abduction  measure)  with  Qa  are 
theoretically  derived  from  Titze  (l  988)and  Rothenberg  A  Mahshie  (1988) 
respectively. 
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Subject:  RS 


Qa 


Subject:  BB 


Qa 

Figmnt  18a  (lap).  Reiaiiemikip  between  VPG  <mi  Qa  for  subject  RS;  b 
(bottom ).  whrtwitf  bmwtm  VPC/RAW and  Qofor  smbjoct  BB. 


tween  the  vocal  processes  of  the  arytenoid  cartilages.  The 
results  suggested  that  EGGW  monotooicaUy  increased  as 
the  vocal  process  gap  decreased,  at  least  for  the  comfortable 
pitch  and  loudness  instruction  used  for  the  two  normal 
subjects  studied.  EGGW  reached  a  value  between  approxi¬ 
mately  0.60  and  0.65  when  the  vocal  processes  just  touched. 
EGGW  tended  to  increase  as  adduction  was  increased  after 
the  vocal  processes  touched,  suggesting  additional  com¬ 
pression  of  the  vocal  processes  of  the  arytenoid  cartilages. 
This  study  suggests  that  EGGW  eventually  may  be  useful 
in  inferring  actual  glottal  adduction  distances  in  subjects  or 
patients. 

Other  measures  of  tissue  approximation,  such  as 
the  interarytenoid  gap,  the  distance  between  the  ventricular 
folds,  and  the  length  of  the  open  posterior  glottis,  also 
appear  to  be  viable  measures  of  glottal  adduction.  The 
degree  of  closure  of  the  posterior  glottis  is  important 


because  it  may  relate  to  the  degree  of  hyperadduction  (as 
suggested  here),  dynamic  stability  of  arytenoid  movement 
and  interarytenoid  pressures  (Scherer,  Cooper,  Alipour- 
Haghighi  &  Titze,  1985),  and  aeroacoustic  influence  on  the 
glottal  volume  velociy  signal  affecting  vocal  tract  excita¬ 
tion  (Cranea  &  Scbroeter,  1992).  The  length  of  the  open 
posterior  glottis  is  not  easily  seen  in  many  people  because 
of  the  “overhang”  of  the  cuneiform  and  corniculaie  cartilages. 
The  distance  between  the  ventricular  folds  may  be  a  rel¬ 
evant  measure,  especially  when  it  begins  to  decrease  during 
phnnation,  suggesting  the  inclusion  of  additional  muscle 
forces. 

In  addition,  this  study  suggests  that  the  range  of 
values  for  the  EGGW  measure  for  normal  phonation  (nei¬ 
ther  hypoadducted  nor  hyperadducted)  is  between  about  0.4 
and  0.6.  Although  this  conclusion  may  be  drawn  from  ibis 
study  for  normal  speakers,  it  may  not  bold  (for  example)  for 
classically  trained  male  operatic  voices  during  singing 
where  full  glottal  closure  might  be  the  normal  expectation 
(e.g„  Scherer  &  Titze,  1987;  also  cf.  Howard,  Lindsey  & 
Allen,  1990)  and  phonation  would  not  be  labelled  as 
hyperadduction  with  the  connotatioo  of  abnormal  function. 

The  primary  caution  of  this  study  is  that  EGGW  is 
expected  to  be  a  function  of  vocal  fold  length  (decreasing 
with  greater  length  as  the  vertical  glottal  depth  decreases), 
subglottal  air  pressure  (increasing  with  greater  pressure  as 
larger  collision  forces  and  greater  contact  area  are  ex¬ 
pected;  Orlikoff,  1991,  demonstrated  a  significant  increase 
in  the  EGGW  measure  with  intensity  increase;  also  ref. 
Kempster,  Preston,  Mack  &  Larson,  1987,  and  Dromey, 
Stathopoulos  &  Sapienza,  1992),  any  vocal  fold  abnormal¬ 
ity  (e  g.,  increasing  with  edema,  decreasing  witb  bowing; 
ref.  e.g.,  Kitzing,  1990;  application  to  neurological  disease 
problems  should  be  feasible,  e.g..  Countryman  &  Ramig, 
1993;  Ramig,  Scherer,  Winholtz,  Benjamin,  Lane  &  Coun¬ 
tryman,  1992),  larynx  height  (if  a  lowered  larynx  tends  to 
lengthen  the  vertical  glottis  dimension),  and  vocal  tract 
distortions  (in  the  sense  that  simultaneous  tilting  of  the  bead 
or  protrusionof  the  mandible,  as  was  performed  in  thisstudy 
witb  subjects  BB  and  RS  for  laryngeal  visualization,  may 
place  the  glottis  in  an  atypical  posture).  The  relationship 
among  EGGW,  independent  variables  of  phonation  (vocal 
fold  length,  subglottal  pressure,  and  arytenoid  adduction), 
oscillatory  dependencies  on  normal  biomechanical  changes 
of  the  vocal  fold  (e.g.,  degree  of  contraction  of  the  vocalis 
muscle),  and  vocal  fold  abnormalities,  needs  to  be  mapped 
out  This  study  was  performed  at  comfortable  pitch  and 
loudness  levels  only.  It  is  expected  that  EGGW  should  be 
useful  as  a  glottal  adduction  measure  for  comfortable 
ranges  of  pitch  and  loudness  for  a  subject  over  time. 
Comparison  of  EGGW  values  across  subjects  should  be 
made  carefully.  Obtaining  valid  EGG  recordings  is  an 
obvious  prerequisite  (e.g.,  Colton  &  Coature,  1 990;  Houben, 
Buekers  &  Kingma,  1992). 
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Introduction 

At  least  75%  of  patients  with  Parkinson’s  disease 
have  disordered  speech  and  voice,  with  every  Parkinson’s 
disease  patient  developing  speech  and  voice  disorders  as  the 
disease  progresses  (1, 2).  Reduced  vocal  loudness  may  be 
one  of  the  first  signs  of  Parkinson’s  disease  (3)  and  is  a 
classic  speech  symptom  together  with  monotone,  imprecise 
articulation,  hoarse  and  breathy  voice,  vocal  tremor  and 
short  rushes  of  speech  (3,4).  These  characteristics  have 
been  associated  with  rigidity,  hypokinesia  and  tremor  in  the 
muscles  of  the  speech  mechanism  (Figure  1).  For  example, 
reduced  loudness  has  been  related  to  rigidity  in  laryngeal 
musculature  and  bowed  vocal  folds  (5).  Decreased  range  of 


1  MUSCLES  4  STRUCTURES 
OF  RESPIRATION 

2  LARYNX 

3  SOFT  PALATE 

4  TONGUE  BLADE 

5  TONGUE  TIP 

6  LIPS 

7  MANDIBLE 


Figure  1.  Functional  components  of  the  speaking  mechanism,  showing 
areas  where  the  airstream  may  be  valved.  Adapted  from  NetseU. 


tongue.  Up  and  jaw  movement  due  to  rigidity  has  been 
associated  with  imprecise  articulation  (6). 

While  previous  approaches  to  speech  treatment 
for  patients  with  Parkinson’s  disease  have  had  limited 
effectiveness  (7,  8,  9),  this  chapter  will  present  a  new 
method  of  intensive  voice  treatment  with  well-documented 
short  and  long-term  efficacy  (10,  11, 12).  The  rationale, 
experimental  documentation  of  efficacy,  key  treatment 
elements  and  considerations  for  implementation  of  this 
approach  wiU  be  discussed. 


Traditional  Approaches  to  Speech  Therapy  for 
Patients  with  Parkinson's  Disease 

The  typical  patient  with  Parkinson’s  disease  who 
seeks  or  is  referred  for  speech  therapy  has  a  moderate  to 
severe  speech  disorder  and  complains  of  reduced  speech 
intelligibility.  He  and  his  family  express  frustration  at  not 
being  able  to  communicate  effectively  and  frequently 
report  that  the  patient  withdraws  from  conversations  and 
many  social  situations  because  people  cannot  understand 
him. 

The  traditional  speech  treatment  approach  for 
patients  with  Parkinson’s  disease  has  involved  therapy 
which  is  delivered  once  or  twice  a  week  in  an  outpatient 
clinic.  Typically  the  focus  of  treatment  has  been  on 
improvement  of  articulatory  precision,  reduction  of  rate, 
and  enhancement  of  intonation  (13,  14).  Patients,  their 
famiUes  and  speech  clinicians  wili  report  some  degree  of 
speech  improvement  during  the  course  of  therapy,  but 
carryover  or  maintenance  of  the  treatment-related  changes 
once  therapy  is  discontinued  is  generally  disappointing. 
Consequently  the  ability  to  communicate  deteriorates  in 
many  Parkinson’s  Disease  patients  as  their  disease  progresses 
(15,  16).  This  communication  impairment  Umits  the 
Parkinson’s  disease  patients’  full  participation  in  society. 
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Intensive  Voice  lyeatment  for  Parkinson's  Disease 

In  1987,  Ramig  and  Mead  developed  a  treatment 
program  for  patients  with  Parkinson's  disease  which  fo¬ 
cused  on  voice  therapy  rather  than  speech  therapy.  Their 
approach  was  shaped  by  a  number  of  factors:  the  high 
incidence  of  disordered  voice  in  Parkinson's  disease  (e.g., 
Logemann  (16)  reported  89%  of  200  Parkinson's  disease 
patients  had  disordered  voice),  the  apparent  role  of  reduced 
loudness  in  reducing  patients’  communication  intelligibil¬ 
ity,  and  reports  that  intensive  speech  therapy  focusing  on 
phooation  has  been  of  value  to  patients  with  Parkinson’s 
disease  (17, 18).  Ramig  and  Mead  designed  a  treatment 
program  to  improve  perceptual  characteristics  of  voice  and 
functional  communication  by  targeting  the  underlying  la¬ 


ryngeal  pathophysiology  associated  with  the  voice  disorder 
(19).  This  approach  is  summarized  in  Table  1.  For 
example,  the  breathy,  weak  voices  of  patients  with 
Parkinson's  disease  have  been  associated  with  glottal  in¬ 
competence  (e.g.,  bowed  vocal  folds  (S),  anterior  vocal  fold 
gaps).  A  primary  therapy  goal  is  to  increase  loudness  and 
decrease  breathiness  by  increasing  vocal  fold  adduction. 
The  monotonous  voices  of  Parkinson's  disease  patients 
have  been  associated  with  rigidity  in  the  cricothyroid 
muscles.  A  second  goal  of  therapy  is  to  improve  intonation 
by  increasing  cricothyroid  muscle  activity.  The  hoarse 
voices  of  Parkinson’ s  disease  patients  have  been  associated 
with  vocal  fold  vibratory  instability.  A  third  goal  of  therapy 
is  to  improve  voice  quality  by  increasing  stability  of  vocal 
fold  vibration. 


Table  1. 

Framework  aad  rationale  for  iaitial  program  of  speech  therapy  administered  to  forty  patients  with  idiopathic 
Parkiasoa's  disease;  treatment  philosophy  is  intensive  therapy  with  a  focus  oa  increased  phoaatory 
effort  aad  immediate  carryover  into  functional  communicttioa. 


Parceptual 
characteristics 
of  speech 


Hypothesized  laryngeal 
and/or  respiratory 
pathophysiology 


Therapy  goals 
and  tasks 


Acoustic, 

physiologic 

variables 

measured 


Perceptual 

variables 

measured 


"Reduced  loudness, 
breathy,  weak  voice" 
(Logemann,  et  al., 
1978;  Aronson,  1985) 


"Reduced  pitch 
variability 
monopitch" 
(Logemann,  et  al., 
1978;  Aronson, 
1985) 


Bowed  vocal  folds 
(Hansen  et  al.,  1984), 
rigidity,  hypokinesia 
in  laryngeal  and/or 
respiratory  muscles; 
reduced  adduction; 
reduced  inspiratory, 
expiratory  volumes 
(Critchley,  1981) ; 


Rigidity 

cricothyroid 

muscle 

(Aronson,  1985) 


1)  increase  vocal 
fold  adduction 
-  isometric 
(pushing,  lift¬ 
ing)  with  phon- 
ation 

increase  maximum 
duration  vowel 
phonation  at 
increased  intensity 

-think  "shout" 
-speak  over  back¬ 
ground  noise 


Maximum  duration 
of  sustained 
vowel  phonatlnn 
(sec) 


loudness 

intelligibility 


2)  increase  respir¬ 
atory  support 
-posture 
-deep  breath  before 
speak 

-frequent  breaths 
-phrasing  of  words 
in  sentences 


vital 

sapasity  (L;*j 


1)  increase  maximum 
fundamental  fre¬ 
quency  range 
-high  and  low 
pitch  scales 

-Sustain  phon¬ 
ation  at 
highest  and 
lowest  pitches 


maximum  range 
of  fundamental 
frequency  (ST) 


variability  of 
fundamenta ) 
frequency  in 
connected 
speech  (STSD) 


monotone 

intelligibility 
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The  specific  tasks  used  in  treatment  were  designed 
to  address  these  goals  through  enhanced  phonatory  effort 
This  treatment  program  has  come  to  be  known  as  The  Lee 
Silverman Voice  Treatmentfor  Parkinson’ s  Disease  (LSVT). 
named  for  tbeCenter  in  which  it  was  developed  in  Scottsdale, 
Arizona. 

Documentation  of  Voice  Treatment  Efficacy 

Given  the  limited  efficacy  of  previous  methods  of 
speech  therapy  for  Parkinson’s  Disease  patients,  an  essen¬ 
tial  component  in  the  development  of  The  Lee  Silverman 
Voice  Treatment  for  Parkinson’s  Disease  was  to  objec¬ 
tively  quantify  pie-  to  post-treatment  improvement  as  well 
as  long-term  maintenance  of  treatment-related  changes. 
Statistically  significant  increases  on  the  variables  maxi¬ 
mum  duration  of  phonation,  maximum  pbo nation  range, 
and  fundamental  frequency  in  reading  have  been  docu¬ 
mented  following  this  intensive  voice  treatment  (10).  The 
magnitudes  of  these  pie-  to  post-treatment  differences  were 
significandy  greater  than  those  measured  in  an  untreated 
control  group  (20).  Three  to  six  month  follow-up  data 
support  maintenance  of  these  post-treatment  increases  (12). 
Corresponding  improvements  in  perceptual  aspects  of 
speech  production,  e.g.,  intelligibility  and  loudness,  have 
been  reported  pre-,  post-  and  follow-up  treatment  as  well 
(11). 

Ongoing  efficacy  studies  are  addressing  the  under¬ 
lying  physiologic  bases  for  improved  communication  in 
patients  following  voice  treatment  (21).  Measurements  are 
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Figure  2.  Model  for  assessment  of  voice  treatment  efficacy  for  patients  with 
Parkinson 's  disease  Treatment  stimulates  increased  effort.  Corresponding 
measures  are  made  of  kinematic  (respiratory  excursions,  vocal  fold 
adduction),  aerodynamic  (sub glottal  pressure,  maximum  flow  declination 
rate),  acoustic  (sound  pressure  level,  fundamental  frequency  variability, 
phonatory  stability  [jitter,  shimmer],  articulatory,  acoustics 
fspirrmthnticmii  perreptun!  flnudneti  intonation,  quality  and  articulation), 
WteUuihilitX  and  junctional  communication  changes  following  treatment. 


being  made  of  rib  cage  and  abdomen  kinematics,  vocal  fold 
closure,  intensity,  subglottal  air  pressure  and  glottal  air  flow 
as  well  as  speech  intelligibility  before  and  after  intensive 
voice  treatment  This  approach  to  the  documentation  of 
efficacy  is  summarized  in  Figure  2.  Findings  to  date  support 
statistically  significant  post-treatment  increases  in  vocal 
fold  adduction  (quantified  from  videolaryngostroboscopy) 

(22)  and  sound  pressure  level  (quantified  from  the  acoustic 
signal)  and  suggest  that  increased  vocal  fold  adduction  is  a 
key  element  in  treatment  success. 

Key  Elements  of  Voice  Treatment  for  Parkinson's 
Disease 

The  Lee  Silverman  Voice  Treatment  for 
Parkinson’s  Disease  differs  from  previous  approaches  to 
speech  therapy  in  a  number  of  ways.  The  treatment  focuses 
on  voice  production,  is  intensive  (four  times  a  week  for  one 
month),  and  it  requires  that  patients  be  habituated  to  a  high 
effort  level  during  speech  production. 

The  singular  focus  of  treatment  is  on  increasing 
vocal  effort  in  order  to  enhance  vocal  fold  closure  and 
loudness.  While  Parkinson’s  disease  patients  do  have 
disordered  articulation  and  rate,  the  consistent  focus  for  all 
sixteen  sessions  of  treatment  is  vocal  effort.  It  has  been 
observed  that  even  in  a  mild  patient,  a  consistent  vocal 
effort  focus  during  all  sixteen  sessions  is  necessary  to  reach 
the  habitual  use  of  the  louder  voice.  When  the  focus 
remains  on  increased  phonatory  effort,  vocal  loudness  is 
increased  and  this  effort  generalizes  in  enhanced  articula¬ 
tion  as  well.  Both  vocal  and  articulatory  contributions  to 
enhanced  speech  intelligibility  have  been  documented 
following  treatment  which  focuses  on  vocal  effort  alone 

(23) . 

The  singular  focus  on  increased  vocal  effort  makes 
an  immediate  impact  on  vocal  loudness  and  speech  intelli¬ 
gibility  with  a  relatively  simple  task-  “speak  loud”  or 
“shout”.  After  the  first  session  of  treatment,  patients  are 
often  able  to  use  a  louder  voice  in  simple  greetings  such  as, 
“Hi  Honey!”  In  many  cases,  this  provides  the  initial 
positive  feedback  patients  and  families  need  to  enhance 
their  motivation  and  confidence  to  focus  intently  on  treat¬ 
ment 

The  primary  goal  of  treatment  is  to  elicit  a  louder 
voice  with  good  quality.  This  is  accomplished  through 
adduction  exercises  which  may  include  “pushing”  exer¬ 
cises  (24)  and  loud  phonation.  Once  the  louder  voice  is 
established,  respiratory  and  laryngeal  coordination  at  in¬ 
creased  loudness  levels  is  practiced.  Exercises  such  as 
maximum  duration  vowel  phonation  targeting  duration, 
constant  loudness  and  steadiness  are  practiced  ten  to  fifteen 
times  per  session  with  the  patient  being  constantly  urged  to 
“go  longer,  louder  and  steadier”.  Fundamental  frequency 
range  is  another  maximum  phonatory  effort  task  that  is 
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practiced  six  to  ten  times  per  session.  Both  of  these 
maTimum  effort  tasks  are  carried  out  during  each  of  the 
sixteen  sessions  and  on-line  duration  and  frequency  range 
data  are  collected  on  each  patient’s  performance.  These 
daily  clinical  data  are  useful  for  both  patient  and  clinician 
reinforcement  as  well  as  for  rtnn  imitation  nf  treatment 
efficacy  for  insurance  reimbursement. 

Patients  are  encouraged  to  use  the  same  effort 
level  they  use  during  sustained  pbonatioo  wben  tbey  speak. 
This  focus  on  increased  pbonatory  effort  offers  a  simple 
target  for  speech  production  tasks  with  the  focus  being 
“speak  loud”  or  “shout”.  Parkinson’s  disease  patients  have 
a  well-established  difficulty  in  simultaneously  executing 
two  different  movements  (25,  26)  and  many  experience 
impaired  cognitive  function  (27,  28);  this  may  be  one 
explanation  why  previous  speech  therapy  approaches  for 
Parkinson’s  disease  patients  which  focused  on  multiple 
levels  of  speech  production  (articulation,  rate,  intonation) 
have  not  been  consistently  successful.  In  the  LSVT  treat¬ 
ment  approach,  the  simple  focus  of  increased  loudness  in 
speech  production  is  targeted  hierarchically  from  words, 
phrases,  sentences  and  conversation  on  a  week-by-week 
basis.  The  goal  for  dismissal  from  treatment  is  adequate 
loudness  during  85%  of  spontaneous  utterances  or  targeted 
task  during  treatment  sessions  and  reports  of  70%  carryover 
outside  of  treatment  In  most  cases,  this  goal  can  be  reached 
by  the  fourth  week  of  treatment 

Additional  keys  to  treatment  success  include:  high 
intensify  treatment  and  calibration  The  treatment  style  is 
highly  energetic  and  intensive  both  on  the  part  of  the  patient 
as  well  as  the  clinician.  The  clinician  serves  as  an  energetic 
motivator,  constantly  urging  the  patient  to  sustain  phona- 
tion  “longer,  louder,  higher  and  lower”.  Parkinson’s 
disease  patients  using  increased  effort  and  emotion  are  able 
to  override  bradykinesia  and  improve  task  performance 
(29-31).  It  is  speculated  that  due  to  phonatory  treatment 
tasks  that  stimulate  an  increased  effort  level,  patients  are 
able  to  override  their  speech  mechanism  bradykinesia  and 
improve  pbonatory  and  speaking  performance.  It  is  there¬ 
fore  essential  that  patients  use  this  high  effort  level  through¬ 
out  the  majority  of  each  treatment  session.  Thus,  indi¬ 
vidual.  daily  treatment  is  necessary  for  at  least  the  first 
sixteen  sessions  of  treatment  This  enables  the  clinician  to 
monitor  closely  the  patient’s  effort  level  and  continuously 
motivate  the  patient  to  increase  and  maintain  effort  levels. 
If  the  patient  does  not  achieve  these  effort  levels  during 
treatment  he  will  surely  not  achieve  them  on  his  own 
outside  of  treatment  In  order  to  habituate  this  high  effort 
level  outside  of  the  treatment  room,  homework  is  assigned 
from  the  first  day  of  treatment 

Habituation  and  calibration  of  the  patient  to  the 
new  phonatory  effort  level  is  another  key  component  to 
treatment  success.  Initially,  when  using  the  louder  voice, 
patients  complain  that  they  feel  like  they  are  “shouting  or 


talking  too  loud”.  The  clinician  should  view  this  as  a 
positive  sign  because  it  indicates  that  the  patient  is  using  a 
higher  effort  level.  The  next  critical  phase  is  to  teach  the 
patient  that  thiseffort  level  is  desirable  and  is  not  “too  loud”. 
This  calibration  phase  is  essential  for  successful  treatment. 
If  the  patient  is  not  comfortable  with  his  louder  voice  he  will 
not  use  it  habitually.  Figure3  is  frequently  used  to  explain 
to  patients  that  the  level  of  effort  they  now  must  use  to 
produce  speech  with  normal  loudness  is  comparable  to  the 
level  tbey  used  pre-Parkinson’s  disease  when  they  talked 
loudly  or  shouted  (32).  Other  activities  that  are  helpful  in 
this  calibration  phase  include  feedback  with  a  tape  recorder, 

RELATIVE  LOUDNESS  OF  THE 


As  a  result  ot  Parkinson's  disease, 
you  may  need  to  either  talk  loud 
or  shout  to  have  a  normal  voice, 


Figure  3.  Loudness  model  used  with  Parkinson's  disease  patients  to 
demonstrate  level  of  vocal  loudness  relative  to  vocal  effort.  Adapted  from 
Carolyn  Mead  Bonitad  (1987). 

activities  in  self-monitoring  and  group  therapy  (after  the 
initial  sixteen  sessions).  We  have  found  it  usefril  to  provide 
patients  with  objective  feedback  of  their  intensity  levels 
through  instruments  such  as  the  Voice  Light  (33). 

Critical  Considerations 

Because  the  majority  of  Parkinson’s  disease  pa¬ 
tients  have  disorders  of  articulation  and  rate,  a  speech 
clinician  not  trained  in  the  LSVT  will  be  tempted  to  spend 
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therapy  time  treating  these  disorders.  This  wil)  diffuse  the 
focus  from  increased  pbooatory  effort  and  reduce  the 
lwagniiwV!  of  the  treatment  effect  In  order  to  achieve 
habituation  of  the  louder  voice,  it  essential  to  keep  the  focus 
on  pbooatioD  throughout  all  sixteen  session  of  treatment  It 
has  been  documented  that  increased  pbooatory  effort  gen¬ 
eralizes  to  improved  articulatory  precision  without  diffus¬ 
ing  treatment  focus  to  articulation  (23). 

Pushing  the  Parkinson’s  disease  patient  toahigher 
pbooatory  effort  level  may  be  challenging  for  the  clinician. 
The  style  of  the  treatment  is  positive,  energetic  and  consis¬ 
tently  high  effort  The  clinician  has  the  role  of  "infusing  the 
patient  with  enthusiasm.”  Given  the  reduced  affect  and 
low-energy  style  of  many  Parkinson’s  disease  patients, 
clinicians  may  find  this  to  be  a  challenge.  However,  it  has 
been  reported  that  even  on  days  when  patients  report 
medication  problems  or  fatigue,  if  the  clinician  takes  a 
directive,  energetic  approach,  the  results  are  positive. 
Patients  learn  that  even  when  they  don’ t  feel  their  best  they 
can  still  produce  intelligible  speech.  In  order  to  keep  the 
patientat  a  high  effort  level  throughout  the  therapy  session, 
the  clinician  must  closely  monitor  the  patient’s  output  and 
continuously  encourage  this  increased  effort  This  may  be 
awkward  for  the  clinician  and  she  may  not  push  the  patient 
consistently.  If  the  patient  is  not  pushed  to  high  phonatory 
effort  levels  during  90%  of  each  therapy  session,  the 
resulting  improvement  in  speech  and  voice  will  not  be 
maximal. 

The  clinician  may  be  concerned  that  increased 
phonatory  effort  may  be  abusive  to  voice  production. 
Recent  data  document  significantly  improved  glottal  com¬ 
petence  post-treatment  with  no  significant  supraglottal 
hyperadduction  (22).  All  patients  must  receive  an 
otolaryngological  examination  before  initiating  this  treat¬ 
ment  to  rule  out  any  contraindications  and  post-treatment 
laryngeal  examinations  are  useful  to  document  treatment- 
related  improvements  in  glottal  competence. 

The  clinician  cannot  underestimate  the  signifi¬ 
cance  of  calibration  and  habituation  of  the  patient  to  this 
new  phonatory  effort  level.  The  patient  may  use  the  louder 
voice  in  the  treatment  room,  but  if  he  is  not  completely 
comfortable  with  it,  he  will  not  use  it  in  functional  commu¬ 
nication.  Calibration  and  habituation  begin  with  the  first 
treatment  session  and  continue  daily  throughout  all  ses¬ 
sions. 


Once  the  patient  has  demonstrated  the  ability  to 
speak  with  increased  loudness  85%  of  the  time  in  sponta¬ 
neous  conversational  speech  or  during  tasks  in  the  treat¬ 
ment  room,  and  reports  70%  carryover  outside  of  the 
treatment  room,  therapy  can  be  terminated.  This  goal 
requires  no  less  than  sixteen  sessions  of  individual  treat¬ 


ment  We  recommend  to  the  patient  and  his  family  that  be 
continue  to  practice  his  voice  exercises  for  10-15  minutes 
at  least  three  times  a  week.  Many  patients  have  found  a 
video  tape  of  home  exercises  useful  (34). 

Long-Tferm  Maintenance  and  Follow-up  Treatment 

Research  data  has  documented  a  clear  mainte¬ 
nance  of  treatment  effects  up  to  six  to  twelve  months 
without  additional  treatment  After  six  months,  the  main¬ 
tenance  varies  depending  upon  the  patient  We  recommend 
that  all  patients  be  re-evaluated  six  months  post-treatment 
Given  the  progressive  nature  of  Parkinson’s  disease,  treat¬ 
ment  targets  may  need  to  be  modified.  If  the  voice  has 
deteriorated  six  to  twelve  months  after  treatment  the  most 
common  observation  is  that  the  patient  has  "fallen  out  of 
calibration”  or  has  forgotten  the  level  of  phonatory  effort 
required  for  adequate  loudness.  Frequently,  two  or  three 
voice  treatment  sessions  with  encouragement  to  do  home¬ 
work  (at  least  three  maximum  "ahs”  at  the  beginning  of  the 
day  to  get  calibrated  for  the  day)  will  get  the  patient  back 
on  track. 

Early  Voice  Treatment 

Early  in  the  course  of  Parkinson’ s  disease,  patients 
may  have  a  monotonous  voice  which  is  reduced  in  volume. 
If  voice  treatment  is  initiated  before  speech  intelligibility  is 
reduced,  patients  may  be  able  to  develop  vocal  habits  which 
will  allow  them  to  maintain  communication.  Improved  and 
maintained  speech  intelligibility  may  enhance  and  sustain 
employability  for  Parkinscu’s  disease  patients  in  the 
workforce. 

Prognostic  Factors 

A  number  of  factors  which  increase  the  likelihood 
that  patients  will  improve  with  The  Lee  Silverman  Voice 
Treatmenthave  been  identified.  This  does  noteliminate  the 
possibility  of  improvement  for  patients  who  do  not  have 
these  characteristics.  It  may  mean  that  therapy  will  be  more 
challenging. 

Patients  with  idiopathic  Parkinson’s  disease  and 
the  classic  hypoadducted  voice  (bowed  vocal  folds,  ante¬ 
rior  glottal  gap)  respond  well  to  treatment  The  efficacy  of 
intensive  voice  treatment  on  forms  of  Parkinsonism  as  well 
as  Parkinsonian  patients  with  hypoadducted  voices  is 
being  investigated.  Motivated  patients  who  feel  speech  is 
important  are  good  candidates  for  the  LSVT.  However, 
many  patients  who  were  not  motivated  at  the  beginning  of 
treatment  become  motivated  when  they  learn  that  they  are 
able  to  improve  their  voices.  Patients  who  are  active 
communicators  are  good  candidates;  however  patients  who 
had  withdrawn  from  communication  prior  to  treatment 
report  that  after  therapy  they  talk  more  because  they  feel 
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more  confident.  Patients  who  are  stimulable  to  generate  a 
louder  voice  are  also  very  good  candidates.  While  patients 
with  adequate  cognition  respond  well  to  treatment,  positive 
post-treatment  results  have  been  observed  in  patients  who 
were  mildly  or  moderately  demented  (38). 

Patkntswitk  Atypical  Parkinson's  Disease 

Parkinsonian  patients  with  atypical  Parkinson's 
disease  have  been  treated  with  the  LSVT  and  improve¬ 
ments  have  been  documented  on  a  case-by-case  basis.  Pre- 
to  post-treatment  results  from  a  patient  having  had  a 
bilateral  thalamotomy  (34),  documented  improvements  in 
pbonatory  stability  but  limited  long-term  carryover.  Pa¬ 
tients  with  laryngeal  hyperadduction  resulting  from  either 
physical  pathology  or  secondary  compensatory  behavior, 
respond  well  to  treatment  when  the  focus  is  directed  toward 
increased  respiratory  effort. 

Voice  Treatment  Drug  Treatment  Interaction 

While  neuropharmacological  interventions  have 
proven  effective  in  the  management  of  many  motor  symp¬ 
toms  of  Parkinson’ s  disease,  the  speech  and  voice  problems 
of  these  patients  are  not  consistently  or  significantly  alle¬ 
viated  by  these  interventions  (5,36,37).  In  fact,  in  some 
cases,  drag-related  dyskinesias  effect  respiratory,  laryn¬ 
geal  and  oral  motormusculature  and  are  severely  detrimen¬ 
tal  to  speech  production.  Since  it  cannot  be  assumed  that 
neuropharmological  interventions  will  enhance  speech  pro¬ 
duction,  it  is  important  that  patients  receive  speech  therapy 
inorder  to  maximize  their  communication  intelligibility. 
Research  on  the  interaction  between  neuropharmacological 
treatment  and  The  Lee  Silverman  Voice  Treatment  is 
ongoing. 

Swallowing 

Swallowing  disorders  (dysphagia)  have  been  iden¬ 
tified  in  all  phases  of  the  swallow  (oral  preparatory,  oral, 
pharyngeal  and  esophageal)  in  patients  with  Parkinson’s 
Disease  (39).  While  there  is  a  tendency  for  swallowing 
disorders  to  increase  as  the  disease  progresses,  a  one  to  one 
correlation  has  not  been  established.  Additionally,  it  has 
been  documented  that  patients  with  Parkinson’s  disease 
frequently  aspirate  but  are  unaware  of  their  swallowing 
difficulties  or  aspiration  (40).  Therefore,  early  diagnosis 
through  videofluoroscopy  is  necessary  to  document  the 
presence  of  a  swallowing  disorder.  This  evaluation,  usu¬ 
ally  done  in  collaboration  with  a  speech  pathologist  and 
radiologist,  provides  objective  data  on  the  status  of  the 
swallowing  mechanism.  If  a  disorder  exists,  the  speech 
pathologist  can  teach  the  patient  compensatory  techniques 
inorder  to  prevent  aspiration  or  other  swallowing  difficul¬ 
ties  and  monitor  the  patient  changes.  Maximum  pbonatory 
effort  tasks  used  to  increase  vocal  fold  adduction  and 


improve  voice  production  may  be  useful  in  reducing  aspi¬ 
ration.  Clinical  reports  document  improved  swallowing 
and  less  choking  following  the  LSVT. 

Summary 

The  majority  of  patients  with  Parkinson’s  disease 
can  benefit  from  speech  therapy  designed  to  maintain  and 
increase  their  vocal  loudness.  The  Lee  Silverman  Voice 
Treatment  for  Parkinson’ s  disease  has  experimentally  docu¬ 
mented  short-  and  long-term  effectiveness.  The  ability  to 
communicate  plays  an  important  role  in  the  self-concept 
and  well-being  of  an  individual.  Therefore  speech  treat¬ 
ment  can  play  a  key  role  in  enhancing  the  quality  of  life  of 
a  patient  with  Parkinson  s  disease. 
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The  purpose  of  this  chapter  is  to  review  some 
recent  developments  in  surgical  treatment  of  diseases  of  the 
larynx  in  children.  Recently,  a  collection  of  surgical 
techniques  known  as  laryngeal  framework  surgery  has 
gained  attention.  Though  these  techniques  have  not  been 
widely  used  in  pediatric  otolaryngology  it  is  appropriate  to 
examine  them  in  the  context  of  their  potential  and  early 
attempts  at  clinical  application  to  this  field. 

The  pediatric  otolaryngologist  is  called  on  to 
manage  a  variety  of  laryngeal  disorders.  These  may  affect 
one  or  more  of  laryngeal  functions;  swallowing,  airway 
protection,  respiration,  and  phonation.  An  underlying 
theme  for  this  review  is  that  surgical  treatment  of  the  adult 
larynx  in  one  context  may  be  applied  to  the  pediatric  larynx 
in  another.  In  the  adult,  surgical  treatment  with  laryngeal 
framework  surgery  is  generally  targeted  to  correct  pbonatory 
dysfunction.  This  may  not  necessarily  be  the  sole  focus  of 
treatment  in  children,  where  airway  and  swallowing  con¬ 
cerns  often  supercede  voice  considerations.  For  example, 
treatment  of  glottal  insufficiency  with  laryngeal  framework 
surgical  techniques  may  well  apply  to  children  in  address¬ 
ing  swallowing  dysfunction  while  also  sparing  the  voice. 

A  second  theme  of  this  review  suggests  that  the 
well  described  techniques  of  airway  reconstruction  for 
laryngotracheal  stenosis  in  children  may  also  be  viewed  as 
surgery  of  tire  laryngeal  framework.  As  will  be  discussed, 
laryngotracheal  reconstructive  surgery  often  afreets  laryn¬ 
geal  biomechanics  and  the  voice,  though  the  procedures 
were  primarily  designed  for  airway  restoration. 

Sane  background  on  the  principles  of  laryngeal 
framework  surgery  is  appropriate.  In  1974  Isshiki  coined 


the  term  “thyroplasty”  and  systematized  a  collection  of 
surgical  techniques  that  alter  position,  length,  and  stiffness 
of  the  vocal  folds  and  change  the  voice  through  procedures 
on  the  external  larynx. 1  These  techniques  have  also  become 
known  as  laryngeal  framework  surgery  .u  A  concept 
underlying  these  procedures  is  avoidance  of  surgical  trauma 
to  the  vocal  fokimucosa.  The  vocal  fold  mucosa  is  essential 
to  normal  voice  production.  It  consists  of  a  pliable  “cover" 
of  epithelium  and  snperfical  lamina  propria  that  drapesover 
a  stiff  “body”  composed  of  deep  lamina  propria  (vocal 
ligament)  and  vocalis  muscle.4  The  complex  interaction 
between  these  structures  allows  for  propagation  of  a  muco¬ 
sal  traveling  wave.  The  periodic  interruption  of  air  flow 
from  the  lungs  by  the  closing  and  opening  of  the  glottis  due 
to  the  traveling  mucosal  waves  is  the  essence  of  voice 
production.  Delicate  adjustments  in  stiffness,  length,  and 
position  of  the  vocal  folds  allow  the  production  of  a  range 
of  vocal  pitches,  loudness  changes,  and  registers.  Laryn¬ 
geal  framework  surgery  manipulates  these  laryngeal  bio¬ 
mechanical  variables  to  change  glottal  configuration  and 
mucosal  wave  propagation  to  alter  the  voice.3 

Though  voice  alteration  has  been  the  original 
focus  of  laryngeal  framework  surgery,  this  report  attempts 
to  broaden  this  view  in  pediatric  otolaryngology  to  include 
treatment  of  laryngeal  airway  and  swallowing  dysfunction, 
in  the  context  of  also  restoring  or  preserving  voice.  Impor¬ 
tant  background  fa  this  discussion  includes  pediatric  laryn¬ 
geal  anatomy.  The  developmental  anatomy  of  the  larynx 
and  vocal  folds  as  it  relates  to  laryngeal  framework  surgery 
will  first  be  reviewed,  as  well  as  potential  implications  for 
what  any  intervention  in  childhood  may  yield  fa  the  adult. 
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Anatomy 

The  elegant  studies  of  Hirano  cl  al,  provide  much 
insight  into  the  development  of  the  pbonatory  larynx.  This 
section  reviews  this  research,  and  highlights  the  differ¬ 
ences  in  pediatric  laryngeal  anatomy  relevant  to  laryngeal 
framework  surgery.  In  1981,  at  the  second  Vocal  Fold 
Physiology  Conference,  Hiraoo  et  al  presented  a  study  on 
the  growth,  development,  and  aging  of  the  human  vocal 
folds.1  They  studied  88  normal  larynges  in  patients  whose 
ages  ranged  Grom  a  few  hours  after  birth  to  69  years  old. 
Several  gross  anatomic  and  histologic  variables  were 
studied.  The  length  ofthe  entire  vocal  fold  was  measured, 
and  the  length  of  the  membranous  portion  (anterior  fold) 
and  cartilaginous  portion  (posterior  fold  including  vocal 
process  and  arytenoid). 


LENGTH  Of  VOCAL  fOU> 


It  was  found  that,  up  to  the  age  of  10  years,  the 
length  of  the  vocal  fold  did  not  vary  much  between  males 
and  females.  At  ten  years  of  age  the  length  of  the  membra¬ 
nous  portion  of  the  vocal  fold  gradually  increases  for  males, 
up  to  20  years  of  age. 

At  age  10,  the  membranous  fold  is  6  to  8  mm  long 
in  males  and  females.  In  females  the  vocal  fold  will  increase 
in  length  to  8.5  to  12  mm  by  age  20.  In  males  the  length  will 
more  than  double,  to  14.S  to  18  mm  by  age  20.  The  length 
increase  was  not  dramatic  during  adolescence,  but  gradual 
(Figure  1).  The  study  by  Kahanc  on  morphology  of  the 
prepubertal  and  pubertal  larynx  also  documented  the  changes 
in  vocal  fold  length  with  puberty.*  Measuring  the  entire 
vocal  fold  length  (membranous  and  cartilaginous  portions) 
before  and  after  puberty,  the  average  increase  in  female 
vocal  folds  was  4.2  mm  and  males  was  10.9  mm,  over  twice 
as  much. 

THICKNESS  OF  MUCOSA 

•  -  Mil* 
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Figure  1.  A  (lop):  Length  of  the  entire  vocal  fold  measured  in  nun  for  48 
males  and  40 fannies  nmgmgfrom  a  few  hours  to  60  years  of  age. 

B  (bottom):  Length  of  membranous  portion  of  the  vocal  fold  in  mm  for  48 
males  and  40  females  ranging  from  a  few  hours  to  69  years  of  age.  From 
HiranoM,  Kurila  S,  Nakashima  T:  Growth,  development,  and  aging  of  the 
human  vocal  folds,  in  Bless  DM,  Abbs  JH  (eds.):  Vocal  Fold  Physiology: 
Contemporary  Research  aad  Clinical  Issues,  San  Diego,  College-Hill 
Press,  1983,  pg  26.  Used  by  permission. 


KATIO  OF  THICKNESS  OF  MJC0SA  TO  LENGTH 
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Figure  2.  A(top):  Thickness  of themucosaat  the  midpoint  of  the  membranous 
portion  for  48  males  and  40 females,  ranging  from  a  few  hours  to  69  years 
of  age.  B  ( bottom):  Ratio  of  thickness  of  mucosa  to  length  of  membranous 
portion  for  48  males  and  40 females,  ranging  from  a  few  hours  to  69  years 
of  age.  From  Hirano  M.KuritaS,  Nakashima  T:  Growth,  development,  and 
aging  of  the  hrnnan  vocal  folds,  in  Bless  DM,  Abbs  JH  (eds.):  Vocal  Fold 
Riysioiogy:  Contemporary  Research  and  Clinical  Issues,  San  Diego, 
College-Hill  Press,  1983,  pg  26.  Used  by  permission. 
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The  cartilaginous  portion  of  the  vocal  fold  also 
grows  with  age.  It  increases  from  about  1.2S  mm  in 
newborns  to3mm  in  males  and  15  nun  in  females.  If  a  ratio 
is  made  of  the  length  of  die  membranous  portion  to  that  of 
the  cartilaginous  portion  of  the  vocal  fold,  the  ratio  is  about 
1.5  in  newborns  (see  Figure  2).  It  increases  to  about  4.0  in 
adult  females  and  5.5  in  adult  males.5  This  ratio  has  value 
in  understanding  the  functions  of  the  larynx  in  children.  In 
children,  a  larger  portion  of  the  glottis  comprises  the 
posterior  glottis.  This  has  been  termed  by  Hirano  the 
“respiratory  glottis”.7  Indeed,  respiratory  and  protective 
functions  of  the  larynx  play  a  larger  role  than  pbonatkm  in 
infants  and  children.  The  membranous  portion  of  the  vocal 
folds  is  more  susceptible  to  edema  than  adults,  yet  because 
the  membranous  folds  (the  anterior  or  “phonatocy  glottis”) 
comprises  a  smaller  percentage  of  the  entire  glottal  area 
these  obstructive  effects  are  minimized,  serving  as  a  rela¬ 
tive  protection. 

Hirano  also  studied  the  histology  of  the  vocal  folds 
in  the  developing  larynx.  Hehasteportedextensivelyonthe 
layered  structure  of  the  vocal  fold.'*  It  is  described  as  having 
five  distinct  layers.  The  first  two  layers,  the  vocal  fold 
epithelium  and  superficial  layer  of  the  lamina  propria  (also 
known  as  Reinke’s  space)  comprise  the  “cover”.  Under¬ 
neath  this  is  found  the  intermediate  and  deep  layers  of  the 
lamina  propria  (or  vocal  ligament)  and  the  thyroarytenoid 
or  vocalis  muscle.  The  deeper  layers  are  called  the  “body”. 
The  complex  stiffness  interaction  between  the  cover  and 
body  facilitates  phonation  through  its  range  of  pitches, 
loudness,  and  registers.  This  layered  structure  goes  through 
extensive  maiurational  changes.7  Up  to  four  years  of  age  the 
intermediate  and  deep  layers  of  the  lamina  propria  are  not 
differentiated.  After  four  years  an  immature  vocal  ligament 
is  observed.  There  is  in  childhood  a  much  more  extensive 
density  of  fibroblasts  throughout  the  lamina  propria  than  in 
the  adult  (see  Photo  3;  center-bound  photographic  plate). 
With  growth,  elastic  fibers  of  the  intermediate  layer  de¬ 
velop  and  fibroblasts  decrease  while  collagen  fibers  of  the 
vocal  ligament  form.  By  16  years  of  age  the  layered 
structure  of  the  adult  is  observed.  The  high  density  of 
fibroblasts  in  the  submucosa  of  pediatric  vocal  folds  implies 
that  they  may  be  prone  to  scar  formation  from  surgical 
trauma. 

External  Laryngeal  Developmental  Anatomy 

In  addition  to  the  development  of  endoiaryngeal 
structures,  a  study  of  the  growth  and  development  of  the 
external  laryngeal  cartilages  is  helpful  to  understand  the 
effects  of  growth  on  the  larynx  and  surgical  interventions  to 
these  structures.  The  mostextensive  study  ofthis  topic  was 
done  by  Klock.  In  two  reports,  he  published  reports  of 
extensive  measurements  on  the  anatomic  dimensions  of  the 
larynx  in  infancy  and  childhood.*-’  It  was  found  that,  in 


general,  dm  growth  of  the  overall  dimensions  of  the  larynx 
is  linearly  related  (directly  proportional)  to  crown-heel 
length  (somatic  height).  Laryngeal  growth  is  thus  related 
to  age  only  as  overall  body  growth  relates;  that  is,  a 
sigmoidal  curve  with  acceleration  between  birth  and  three 
years,  then  deceleration,  then  rapid  growth  phase  at  pu¬ 
berty,  especially  in  males.  Kahane  also  documented  the 
changes  in  external  laryngeal  anatomy  resulting  from  pu¬ 
berty.*  Significant  regional  growth  localized  to  the  anterior 
aspect  of  the  thyroid  cartilage  was  measured  in  laryngeal 
specimens  of  pubertal  males,  whereas  other  external  laryn¬ 
geal  measurements  showed  less  dramatic  differences  be¬ 
tween  pubertal  males  and  females. 

A  key  principle  in  laryngeal  framework  surgery  is 
the  relationship  between  external  laryngeal  structures  and 
the  vocal  folds  internally.  Through  numerous  anatomic 
studies  Isshiki  found  that  the  position  of  the  anterior  com¬ 
missure  of  the  vocal  fold  is,  in  general,  half  way  between  the 
thyroid  notch  and  the  inferior  border  of  the  thyroid  lamina.1 IJJ0 
The  vocal  fold  is  in  a  horizontal  plane  at  this  level.  Another 
study  by  Meiteies  et  al  generally  confirmed  this  relation¬ 
ship.11  In  dissection  of  10  female  and  8  male  cadaver 
larynges  the  anterior  commissure  was  in  all  cases  at  or 
slightly  above  the  midpoint  between  the  thyroid  notch  and 
inferior  thyroid  lamina.  The  posterior  end  of  the  vocal  fold 
showed  some  variation,  in  47%  of  specimens  it  sloped 
downward  posteriorly  at  the  oblique  line  (the  line  on  the 
thyroid  ala  joining  the  superior  and  inferior  tubercles).  In 
medialization  laryngoplasty  surgery,  window  placement  at 
or  below  the  inferior  third  of  the  oblique  line  was  recom¬ 
mended  to  avoid  medialization  of  the  false  folds. 

Isaacson  published  a  similar  study  in  pediatric 
larynges.11  Ten  specimens,  ages  10  days  to  16  years  were 
studied.  The  relationship  of  the  level  of  the  anterior 
commissure  and  the  vocal  fold  relative  to  the  external 
thyroid  cartilage  landmarks  was  consistent  throughout  child¬ 
hood,  and  the  same  as  reported  by  Isshiki.  Thus,  though  the 
size  of  external  laryngeal  dimensions  changes  with  growth, 
the  relationship  to  endoiaryngeal  structures  is  maintained. 
It  should  also  be  recognized  that  a  variety  of  asymmetries 
are  present  in  larynges  of  all  ages.  In  a  recent  study,  Hirano 
et  al  found  no  directional  preponderance  in  laryngeal 
asymmetries  of  ten  newborns.15  In  adults,  however,  several 
trends  were  present  With  age,  the  right  thyroid  lamina 
tends  to  tilt  laterally,  and  the  left  thyroid  lamina  tends  to  tilt 
medially.  Also  the  right  cricothyroid  joint  is  located  more 
laterally,  posteriorly,  and  interiorly  than  the  left  joint  The 
thyroid  cartilage  is  tilted  to  the  right  relative  to  the  cricoid 
cartilage.  They  noted  that  all  adult  specimens  were  of  right 
handed  individuals,  and  speculated  that  differences  in 
handedness  may  be  related  to  the  laryngeal  asymmetries 
measured.  Also  of  interest  despite  these  external  asymme¬ 
tries,  they  found  that  the  level  of  the  vocal  folds  remained 
relatively  the  same,  to  maintain  symmetric  vibration. 
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The  effects  on  laryngeal  growth  from  laryngeal 
framework  surgery  are  unknown.  However,  results  ex¬ 
tracted  from  prior  work  in  laryngotracheal  reconstruction 
may  be  applicable.  In  general,  laryngeal  growth  appears  to 
maintain  despite  surgical  intervention.  These  studies  have 
been  recently  reviewed  byCotioo.'4  Laryngofissure  had  no 
effect  on  laryngeal  growth  in  5-week -old  dogs.15  Nasal 
septal  cartilage  and  mucosal  autografts  were  shown  to 
increase  the  circumference  of  the  subglottic  space  in  young 
dogs,  without  affecting  later  laryngeal  growth.1*  Cotton’s 
work  in  laryngeal  anterior  and  posterior  autogenous  auricu¬ 
lar  grafts  in  rabbits  demonstrated  cartilage  graft  viability  at 
both  sices,  with  the  posterior  cricoid  grafts  faring  better.14 
Growth  with  viable  new  cartilage  formation  would  be 
expected. 

Laryngotracheal  Reconstruction  as  Laryngeal 
Framework  Surgery 

Surgical  techniques  for  repair  of  glottic  and 
subglottic  stenosis  have  made  many  advances.17  These 
procedures,  generally  known  as  laryngotracheal  recon¬ 
struction  (LTR),  have  particular  applicability  to  the  pediat¬ 
ric  population  where  laryngotracheal  stenosis  is  a  prevalent 
clinical  problem.  The  goal  of  these  procedures  has  been  to 
restore  the  airway.  However  it  can  be  readily  appreciated 
that  laryngeal  surgery  designed  to  address  one  aspect  of 
laryngeal  function  may  necessarily  affect  other  functions  of 
the  larynx,  eg.  phonation  and  swallowing.1* 

Techniques  of  LTR  may  alter  the  position  and 
anatomy  of  the  vocal  folds  and  endolaryngeal  structures 
through  external  surgical  manipulation  of  their  support,  the 
thyroid  cartilage  and  arytenoids.  In  this  way  these  proce¬ 
dures  may  be  also  viewed  as  laryngeal  framework  surgery. 
Examples  of  this  may  be  seen  in  the  commonly  employed 
techniques  of  LTR.  Cartilage  grafts,  usually  from  autog¬ 
enous  rib,  are  popularly  used  in  LTR.14-17  These  may  be 
placed  in  the  anterior  and/or  posterior  cricoid  region. 
Anterior  cartilage  grafts  alter  the  laryngeal  framework  by 
immobilizing  the  action  of  the  cricothyroid  muscles,  which 
lengthen  and  tense  the  vocal  folds.  Highly  placed  anterior 
cartilage  grafts  may  disrupt  the  anterior  commissure  and 
splay  the  vocal  folds  apart  Posterior  cricoid  cartilage  grafts 
widen  the  posterior  commissure.  This  separates  the  aryte¬ 
noids  and  affects  the  ability  of  the  vocal  folds  to  approxi¬ 
mate  at  the  vocal  processes.  Posterior  cricoid  grafts  may 
also  impair  action  of  the  interarytenoid  muscle.  These 
effects  on  glottal  closure  may  be  more  pronounced  in  the 
pediatric  larynx  since  it  has  a  relatively  larger  posterior 
glottis. 

The  effects  of  cartilage  graft  LTR  procedures  on 
pediatric  voice  have  been  recently  studied.  Several  reports 
have  described  voice  problems  in  these  patients.1**14  Smith 
etal  reported  on  eight  patients  with  voice  problems  follow¬ 
ing  LTR.1*  The  voices  were  frequently  rough,  low  pitched. 


or  breathy.  Two  patients  exhibited  reverse  or  inhalatory 
phonation.  Fiberoptic  laryngoscopy  and  laryngostroboscopy 
demonstrated  supraglottal  phonation  in  three,  glottal  in¬ 
competence  in  two,  arytenoid  fixation  in  two.  anterior 
commissure  blunting  or  widening  in  three,  vertical  asym¬ 
metry  of  the  vocal  folds  in  two,  and  vocal  fold  scarring  (ie. 
absent  mucosal  wave)  in  three.  Most  patients  exhibited 
more  than  erne  abnormal  finding.  Though  the  study  group 
was  not  controlled  aid  was  likely  representative  of  LTR 
patients  with  more  severe  voice  problems,  it  is  comparable 
to  other  reports,55-*5  and  is  indicative  of  the  consequences  for 
voice  that  may  result  from  LTR.  This  can  be  a  significant 
long  term  disability  for  a  child. 

As  Zalzal  has  pointed  out,  the  voice  in  children 
who  are  treated  for  laryngotracheal  stenosis  may  be  af¬ 
fected  by  both  the  underlying  disease  process  and  the 
surgical  treatment  designed  to  correct  the  problem.23  Stenosis 
at  the  level  of  the  free  margin  of  the  vocal  folds  and  scar  of 
the  superficial  lamina  propria  will  inhibit  vocal  fold  vibra¬ 
tion  and  is  difficult  to  reconstruct  It  is  also  apparent  that 
surgery  designed  to  enlarge  the  laryngeal  airway  may 
adversely  affect  phonation,  which  requires  glottal  closure. 
Trends  from  recent  studies  indicate  some  additional  factors 
that  appear  increase  risk  for  a  poor  postoperative  voice 
result  in  children  who  undergo  LTR.  These  include  the  use 
of  posterior  cricoid  cartilage  grafts,  combined  use  of  ante¬ 
rior  and  posterior  grafts,  the  long  term  placement  of 
endolaryngeal  stents,  and  multiple  LTR  procedures.  14-1,J3-24 

Posterior  glottic  and  subglottic  stenosis  can  be 
successfully  treated  with  posterior  cricoid  cartilage  grafts.14-21 
This  technique  can  also  be  used  for  treatment  of  impaired 
vocal  fold  mobility,  such  as  bilateral  vocal  fold  paralysis.1* 
In  the  series  reported  by  Zalzal,  twelve  patients  were  treated 
for  posterior  laryngeal  stenosis  with  posterior  cricoid  car¬ 
tilage  grafts.  The  patient’s  voice  quality  was  assessed  by  a 
household  member  who  spent  the  most  time  with  the  patient 
before  and  after  surgery  (subjective  perceptual  assess¬ 
ment).11  Of  the  eight  patients  with  preoperative  normal 
voice  quality,  only  two  had  a  postoperative  normal  voice 
quality  with  six  patients  reported  as  having  hoarse  or  husky 
voice  quality.  In  another  series  reported  by  Zalzal  et  al, 
sixteen  patients  had  voice  quality  formally  assessed.15  Only 
four  of  the  nine  patients  who  received  posterior  grafts  bad 
breathiness,  yet  these  four  were  the  only  patients  with  a 
breathy  postoperative  voice.  Smith  etai  reported  on  fifteen 
patients  that  underwent  “single  stage”  LTR  (no  trache¬ 
otomy  tube  employed  or  removal  of  the  tracheotomy  tube 
at  initial  surgery).1*  Of  the  twelve  patients  who  were 
successfully  extubated,  at  informal  voice  quality  assess¬ 
ment  three  to  six  months  postoperatively,  seven  bad  normal 
voice,  four  had  moderate  dysphonia,  and  one  had  severe 
dysphonia.  All  five  of  these  patients  had  both  anterior  and 
posterior  cricoid  cartilage  grafts  placed.  For  three  of  these 
five,  the  surgery  was  a  revision  procedure. 
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The  uae  of  endotafyngeal  stents  to  secure  cartilage 
grafts  in  place  has  been  well  described.  However,  these 
stents  appear  to  injure  the  voice,  especially  when  used  long 
term.  In  Cocoa’s  large  series  of  61  patients  that  underwent 
posterior  cricoid  graft  LTR,  the  duration  of  stenting  was 
found  to  be  correlated  with  postoperative  voice  assessment 
in  that  better  voice  results  occurred  when  the  duration  of 
stenting  was  12  weeks  or  less. 14  MaddalozzoandHolinger17 
reported  in  a  series  of  20  children  that  underwent  LTR  that 
hoarseness  was  not  an  infrequent  problem  in  those  that 
required  stenting.  lit  the  report  of  Zalzal  et  al  all  sixteen 
patients  had  a  stent  placed,  fifteen  had  aberrant  voice 
quality.0  The  authors  failed  to  find  a  correlation,  however, 
between  stenting  duration  and  postoperative  voice  quality. 

Several  animal  studies  have  examined  the  effects 
of  stent/intubadon  on  the  larynx,  with  implications  for 
voice  problems.  In  a  goat  animal  study  of  the  effect  of  long¬ 
term  endolaryngeal  stents  oo  the  larynx,  disruption  of 
laryngeal  mucosa  and  underlying  tunica  elastica,  particu¬ 
larly  in  the  posterior  glottis,  was  observed  in  preparations 
that  underwent  endolaryngeal  stentplacementfor  3  months.21 
Squamous  metaplasia  of  the  posterior  glottic  mucosa  was 
seen,  as  well  as  erosion  of  the  vocal  process  of  the  arytenoid 
(see  Pboto4  on  center-bound  photographic  plate).  Epithe¬ 
lial  hyperplasia  and  fibrous  proliferation  in  the  submucosa 
anterior  to  the  vocal  process  were  observed.  Leonard  etal 
studied  the  effect  of  7-day  intubation  in  small  dogs.” 
Larynges  harvested  5  weeks  after  extubation  exhibited 
epithelial  disruption,  hypertrophy  of  the  epithelial  layer, 
and  proliferation  of  subepithelial  connective  tissue.  These 
changes,  although  mainly  in  the  posterior  glottis,  also  were 
observed  in  the  membranous  fold  anterior  to  the  vocal 
process.  It  would  be  expected  that  in  the  infant  and  pediatric 
larynx  endotracheal  tubes  and  stents  would  tend  to  contact 


more  of  the  membranous  folds  and  anterior  glottis.  This  has 
potential  implications  for  injury  to  the  membranous  folds. 
Because  of  the  abundant  and  diffuse  distribution  of  fibro¬ 
blasts  throughout  the  superficial  and  deep  lamina  propria, 
the  membranous  folds  (“phooatory  glottis”)  of  the  pediatric 
larynx  may  be  more  susceptible  to  voice  injury  from 
surgical  trauma,  stents  or  intubation. 

A  summary  of  suggestions  for  minimizing  or 
preventing  pbonation  problems  in  laryngeal  framework 
cartilage  graft  surgery  for  pediatric  laryngotracheal  stenosis 
is  given  in  Table  1. 

Pediatric  Applications  of  Laryngeal  Framework 
Surgery 

The  approach  of  Isshild  to  voice  problems  through 
external  laryngeal  framework  surgery  and  his  surgical 
techniques  have  been  published  extensively,  including  in  a 
previous  volume  of  Advances.1  The  most  common  and 
frequently  reported  of  Isshiki’s  procedures  is  the  Type  I 
thyroplasty,  also  commonly  known  as  medialization 
laryngoplasty.3®-31  Another  laryngeal  framework 
medialization  procedure  is  the  arytenoid  adduction 
laryngoplasty.32-34  The  other  thyroplasty  types  described 
that  have  been  used  for  vocal  pitch  change,  or  spasmodic 
dysphonia  are  not  generally  applicable  to  children.  The 
medialization  laryngoplasty  and  arytenoid  adduction  pro¬ 
cedures,  however,  do  have  promise  in  the  treatment  of 
selected  pediatric  patients.  As  compared  with  adult  pa¬ 
tients,  a  different  approach  is  necessary  in  considering  1) 
indications  for  surgery,  2)  surgical  plan,  and  3)  anatomic 
differences  in  the  pediatric  larynx  that  influence  surgical 
technique. 

For  children  who  are  candidates  for  medialization 
laryngoplasty,  aspiration  may  often  be  the  major  symptom 


Tabic  1. 

Laryngotracheal  reconstruction  procedures:  potential  adverse  effects  oa  voice  aad  techniques  to  avoid  and  minimize  them. 
(Hum  Smith  ME.  Mortelliti  AJ,  Cotton  RT,  et  al:  Phooahoo  and  swallowing  considerations  in  pediatric  laryngotracheal  reconstruction. 
Aim  Otoi  Rhinal  Laryngol  1992;  101:731-738.  Used  by  permission) 


Laryngotracheal 
Reconstruction  Procedures 

Potential  Adverse  Effects 
on  Phonation 

Techniques 

Anterior  laryngotracheal  split 
and/or  graft 

Anterior  commissure 
disruption;  vocal  fold  vertical 
asymmetry;  cricothyroid 
muscle  dysfunction; 
supraglottic  collapse 

Avoid  complete 
laryngoftssure,  if  possible; 
avoid  graft  placement  in 
anterior  commissure;  pet  form 
exact  alignment  of  anterior 
commissure 

Posterior  laryngotracheal 
split  and/or  graft 

Increased  glottic  gap;  impaired 
arytenoid  adduction;  arytenoid 
subluxation 

Avoid  excessive  graft  width; 
use  gentle  retraction  of 
hemicricoid 

Stents 

Scarring  of  vocal  fold  mucosa; 
impaired  arytenoid  mobility 

Minimize  stenting  time;  use 
single-stage  laryngotracheal 
reconstruction,  if  possible; 
stent  beiow  vocal  folds,  if 
possible 
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rather  than  voice.  Glottal  insufficiency  may  be  due  to 
unilateral  or  bilateral  vocal  fold  paralysis  or  paresis.  Bilat¬ 
eral  vocal  fold  paralysis  in  adults  is  usually  due  to  peripheral 
nerve  injury,  but  may  have  central  etiology  in  head  injured 
or  stroke  patients.  In  infants,  this  problem  usually  has  a 
central  etiology.  There  may  be  associated  cortical  dysfunc¬ 
tion  and  developmental  delay.  Because  of  central  neuro¬ 
logic  dysfunction,  swallowing  may  frequently  be  affected 
end  chronic  aspiration  present.  The  presence  of  a  trache¬ 
otomy  to  secure  the  airway  will  not  prevent  aspiration,  and 
recurrent  pneumonia  may  result  V  ocal  fold  medialization, 
by  injection  or  exteral  augmentation,  has  been  described  as 
treatment  for  aspiration  by  several  authors.*3-”  However, 
the  success  of  various  techniques  employed  has  not  been 
systematically  investigated.  Medialization  laryngoplasty 
provides  an  option  for  improving  glottal  competence  to 
minimize  aspiration,  while  attempting  to  avoid  procedures 
such  as  laryngotracheal  separation  that  would  render  the 
patient  totally  aphonic. 

Medialization  laryngoplasty  for  the  treatment  of 
swallowing  and  aspiration  problems  in  selected  pediatric 
patients  has  been  employed  (Cotton  RT,  personal  commu¬ 
nication).  A  case  reported  in  the  literature  by  Isaacson13  is 
illustrative  in  this  regard  and  will  be  reviewed  in  detail.  He 
described  a  case  of  a  14  year  old  patient  that  was  neurologi- 
cally  impaired  from  a  severe  closed  head  injury  at  age  4. 
The  patient  had  been  decannulated  from  a  tracheotomy  at 
age  9.  A  unilateral  vocal  fold  paralysis  had  been  treated 
with  a  Teflon®  injection,  yet  development  of  stridor  after 
the  injection  resulted  in  tracheotomy  replacement  An 
arytenoidectomy  had  then  been  performed,  but  the  patient 
developed  aspiration.  The  boy  remained  dependent  on 
gastrostomy  tube  feedings  and  took  nothing  by  mouth. 
After  referral  to  Isaacson’s  institution,  the  Children’s  Hos¬ 
pital  of  Pittsburgh,  an  operation  was  devised  which  modi¬ 
fied  the  standard  technique  of  medialization  laryngoplasty. 
The  window  in  the  thyroid  ala  was  designed  so  that  the 
depressed  cartilage  window  would  fill  in  the  soft  tissue  and 
provide  bulk  to  the  region  of  the  absent  arytenoid.  A  custom 
silastic  block  was  created  to  secure  this  window.  Following 
surgery  the  patient’s  swallowing  improved  to  the  point 
where  he  could  swallow  solids  and  semi-solids,  but  aspi¬ 
rated  with  liquids.  Six  months  later  the  posterior  glottal  gap 
was  noted  to  be  slightly  larger.  An  additional  Teflon® 
injection  to  this  region  of  the  reconstructed  arytenoid 
improved  the  patient’s  swallowing  so  that  liquids  could  be 
swallowed  without  aspiration.  Eventually,  the  patient  no 
longer  required  the  gastrostomy  tube,  but  remained  trache¬ 
otomy  dependent 

Arytenoid  adduction  laryngoplasty  also  accom¬ 
plishes  vocal  fold  medialization,  through  suture  fixation  of 
the  muscular  process  of  the  arytenoid  to  adduct  the  vocal 
process.  While  medialization  laryngoplasty  affects  closure 


of  the  anterior  (membranous)  fold,  arytenoid  adduction  is 
best  suited  to  close  the  posterior  glottis.31-37  This  also 
accomplishes  medialization  and  lengthening  of  the  vocal 
fold  as  well  as  appropriate  alignment  for  phooation  in  vocal 
folds  on  different  vertical  levels.  Since  the  pediatric  larynx 
has  a  larger  proportion  of  cartilaginous  (posterior  glottis)  to 
membranous  fold  versus  the  adult,  laryngeal  procedures 
which  are  designed  to  close  the  posterior  glottis  may  have 
pediatric  application.  It  would  appear  that  the  design  and 
effect  of  an  arytenoid  adduction  procedure  is  ideally  suited 
for  the  pediatric  larynx.  Preliminary  experience  suggests 
that  it  has,  as  expected,  worked  well  to  improve  swallowing 
but  that  voice  improvement  has  not  been  as  pronounced  and 
dramatic  as  that  seas  in  adults.  This  is  probably  related  to 
the  fact  that  even  though  the  arytenoid  is  adducted,  the 
larger  posterior  pediatric  glottis  (relative  to  the  anterior 
glottis)  still  remains  partially  open  during  phonatioo  result¬ 
ing  in  some  persistence  of  a  breathy  voice. 

Case  Report:  A  14-year-old  female  was  suffering 
from  recurrent  aspiration  pneumonias  and  a  near  inability 
to  orally  feed  due  to  aspiration.  She  had  a  left  vocal  fold 
paralysis  due  to  a  neuroblastoma.  The  patient  had  under¬ 
gone  chemotherapy  and  was  now  in  remission.  Due  to  her 
recurrent  aspiration  pneumonias  and  her  very  poor  oral 
intake,  she  was  cachectic  and  emaciated.  Although  she  was 
over  five  feet  tall,  her  body  weight  was  86  pounds.  Laryn¬ 
geal  examination  showed  a  wide  glottic  aperture  with  a 
divergent  (triangular)  glottic  configuration.  Based  on  this 
assessment  a  medialization  thyroplasty  would  not  be  suffi¬ 
cient  in  closing  the  posterior  glottis.33-”  Therefore,  a  left 
arytenoid  adduction  was  performed.  Following  the  proce¬ 
dure,  aspiration  was  resolved,  the  teenager  was  able  to  gain 
weight,  and  a  gastrostomy  tube  was  avoided.  The  disap¬ 
pointing  part  of  this  case  is  that  postoperatively  the  voice 
was  perceptually  breathy,  even  though  measures  of  glottal 
aerodynamics  showed  considerable  improvement  This 
case  illustrates  that  arytenoid  adduction  alone  may  not  be 
adequate  enough  by  itself  to  correct  a  breathy  voice  asso¬ 
ciated  with  unilateral  vocal  fold  paralysis  in  the  pediatric 
patient  It  does  demonstrate  the  effectiveness  of  arytenoid 
adduction  laryngoplasty  in  treatment  of  aspiration  prob¬ 
lems. 

Case  Report:  A  15-year-old  with  an  appropriate 
body  weightand  heigbtforage,  had  been  treated  fornodular 
sclerosing  Hodgkin’s  disease.  She  had  failed  therapy  and 
then  underwent  a  bone  marrow  transplant  During  the 
course  of  therapy,  she  experienced  a  left  vocal  fold  paralysis 
and  a  very  breathy,  weak  voice.  She  bad  previously  been 
quite  active  in  her  high  school  activities  and  felt  that  the 
breathy  voice  was  more  disabling  than  the  rest  of  her 
disease.  After  waiting  nine  months,  she  underwent  an 
arytenoid  adduction  laryngoplasty.  Clinically,  her  voice 
was  unproved  and  objective  voice  aerodynamic  measures 
corroborated  this  result  Over  the  following  year  as  she 
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continued  to  grow,  sbe  experienced  more  vocal  improve¬ 
ment.  This  case  is  used  to  illustrate  two  things.  First,  an 
arytenoid  adduction  can  be  used  to  treat  a  breathy  voice  in 
the  older  teenager  as  in  adult  disorders  and  second,  contin¬ 
ued  growth  through  puberty  is  likely  to  improve  voice 
results.  This  is  attributed  to  change  in  the  ratio  of  posterior 
glottis  to  anterior  glottis,  described  by  Hirano  et  al.SJ  This 
ratio  declines  rapidly  through  the  first  ten  years  of  life,  and 
also  declines  further  through  the  second  decade.  With 
change  in  this  ratio,  more  glottal  air  flow  during  pho nation 
would  be  directed  through  the  membranous  glottis  to 
increase  glotta'  vibration. 

These  cases  demonstrates  the  utility  of  laryngeal 
framework  surgery  in  being  adaptable  to  treat  problems 
affecting  phonation and  swallowing  in  the  pediatric  patient 
Best  results  have  been  obtained  in  the  teenage  population  as 
opposed  to  those  under  twelve  years.  The  long  term  results 
of  these  procedures  await  further  experience.  Results  will 
likely  be  influenced  by  the  natural  deterioration  of  function 
in  children  with  central  neurological  problems.  A  role  may 
be  found  for  such  procedures  that  ameliorate  symptoms  and 
improve  laryngeal  function  in  these  children. 

Surgical  Technical  Points 

Medialization  laryngoplasty  is  usually  performed 
under  local  anesthesia  in  adults.  This  has  not  been  our 
experience  in  children,  and  general  anesthesia  has  been 
used.  These  children  frequently  have  concomitant  airway 
and  swallowing  problems  that  have  necessitated  the  need 
for  a  tracheotomy.  However,  selected  patients  may  be 
candidates  for  local  anesthesia  and  light  sedation;  espe¬ 
cially  if  they  are  older  children  who  can  cooperate  for 
intraoperative  voice  assessment  during  positioning  of  the 
implant 

The  anatomy  of  the  pediatric  larynx  presents 
several  differences  that  the  surgeon  is  aware  of  when 
performing  medialization  laryngoplasty  or  other  laryngeal 
surgery.  The  cartilage  of  the  larynx  is  soft  and  not  calcified. 
The  angle  between  the  thyroid  ala  is  wide  and  the  midline 
less  distinctly  palpable.  The  thyroid  notch  must  be  care¬ 
fully  identified.  The  notch  may  be  obscured  by  an  overrid¬ 
ing  hyoid  bone,  since  in  infants  and  children  the  larynx  has 
not  fully  descended  in  the  neck. 

For  thyroplastic  medialization  laryngoplasty,  ana¬ 
tomical  reference  points  regarding  the  level  of  the  vocal 
fold  have  been  described  above.  The  size  of  the  window  for 
medialization  and  implant  placement  is  proportionally 
smaller.30  It  is  unnecessary  to  use  a  burr,  saw,  or  drill  in 
cutting  the  cartilage  window.  ABeaver®  knife  and  otologic 
instruments,  such  as  a  canal  wall  knife  and  House  elevator 
(“Gimmick”),  may  work  well  for  these  purposes.  Because 
of  the  soft  cartilage  care  must  be  taken  in  securing  the 
implant  in  place. 


Arytenoid  adduction  in  the  pediatric  population  is 
a  more  tedious  procedure  than  thyroplastic  medialization. 
The  smaller  size  of  the  pediatric  larynx  makes  this  proce¬ 
dure  technically  challenging.  The  muscular  process  of  the 
arytenoid  is  not  very  prominent  and  it  is  easy  to  misplace  the 
suture  for  the  muscular  process  too  superiorly.  This  results 
in  prolapse  or  tilting  of  the  arytenoid  cartilage  anteriorly. 
To  make  sure  the  desired  effect  of  arytenoid  adduction  is 
achieved,  the  larynx  must  be  examined  while  the  arytenoid 
is  being  adducted.  This  is  best  done  by  having  the  vocal 
folds  visualized  on  the  monitor  while  the  surgeon  is  apply¬ 
ing  tension  to  the  suture  which  is  adducting  the  arytenoid. 
Furthermore,  once  the  suture  is  tightened  and  the  arytenoid 
is  adducted,  it  is  preferred  to  not  re-in tubate  the  patient 
This  requires  cooperation  between  the  anesthesiologist  and 
the  surgeon.  The  airway  can  be  managed  with  a  mask  or 
with  negative  pressure  ventilation.  A  very  small  endotra¬ 
cheal  tube  may  be  acceptable.  We  have  not  had  good 
success  in  performing  this  procedure  under  local  anesthesia 
in  a  pediatric  patient 

Conclusions 

This  report  has  reviewed  basic  laryngeal  investi¬ 
gations  relevant  to  pediatric  laryngeal  framework  surgery. 
Some  clinical  data  has  been  reviewed,  especially  with 
regard  to  the  influence  on  voice  in  children  who  undergo 
cartilage  graft  laryngotracheal  reconstruction,  which  may 
be  regarded  as  laryngeal  framework  surgery.  Finally, 
examples  of  the  application  of  medialization  laryngoplasty 
and  arytenoid  adduction  laryngoplasty  in  the  pediatric  age 
group  are  presented  and  issues  discussed  regarding  the 
potential  use  of  these  surgical  techniques  in  children.  The 
definitive  role  in  children  of  the  array  of  phonosurgical 
techniques  known  as  laryngeal  framework  surgery  requires 
further  clinical  experience. 
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